Hacker News new | past | comments | ask | show | jobs | submit login

They are comparing a non-ensembled transformer model with an ensemble of simple linear models. It's not surprising that the ensemble models of linear time series models will do well, since ensembles optimize for the bias-variance trade-off.

Transformer/ML models by themselves have a tendency to overfit past patterns. They pick up more signal in the patterns, but they also pick up spurious patterns. They're low bias but high variance.

It would be more interesting to compare an ensemble of transformer models with an ensemble of linear models to see which is more accurate.

(that said, it's pretty impressive that an ensemble of simple linear models can beat a large scale transformer model -- this tells me the domain being forecast has a high degree of variance, which transformer models by themselves don't do well on.)




fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance


Thank you for catching that. Corrected.


> ensemble of transformer models

Isn't that just dropout?


No. Why do you think so?


Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.


Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.


That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.


I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: