They are comparing a non-ensembled transformer model with an ensemble of simple ...

gradascent · 2024-03-22T07:05:31 1711091131

fyi I think you have bias and variance the wrong way around. Over-fitting indicates high variance

wenc · 2024-03-22T07:08:46 1711091326

Thank you for catching that. Corrected.

hackerlight · 2024-03-22T12:22:12 1711110132

> ensemble of transformer models

Isn't that just dropout?

mikkom · 2024-03-22T14:31:46 1711117906

No. Why do you think so?

hackerlight · 2024-03-23T00:09:54 1711152594

Geoffrey Hinton describes dropout that way. It's like you're training different nets each time dropout changes.

wenc · 2024-03-23T15:03:25 1711206205

Dropout is different from ensembles. It is a regularization method.

It might look like an ensemble because you’re selecting different subsets but ensembles combine different independent models rather than just subset models.

wenc · 2024-03-23T17:48:41 1711216121

That said random forests are an internal ensemble, so I guess that could work.

In my mind an ensemble is like a committee. For it to be effective, each member should be independent (able to pick up different signals) and have a greater than random chance of being correct.

hackerlight · 2024-03-24T03:20:36 1711250436

I am aware it is not literally an ensemble model, but Geoffrey Hinton says it achieves the same thing conceptually and practically.