They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.
Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.
It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.
(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)
Dropout is different from ensembles. It is a regularization method.
It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.
That said random forests are an internal ensemble, so I guess that could work.
In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.
Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.
It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.
(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)