Hacker News new | past | comments | ask | show | jobs | submit login

Yes, like RWKV and Mamba this is a new generation of models that are more like big RNNs than pure transformers we have now



Isn't that how previous models were, before the attention is all you need paper?


and is Griffin a state space model?


No, it's a combination of RNN and Transformer.


I mean, SSMs are in fact under the hood RNNs


At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression.

You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling.

Mamba/RWKV/Griffin are somewhere between those two extremes.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: