Hacker News new | past | comments | ask | show | jobs | submit login

No, it's a combination of RNN and Transformer.



I mean, SSMs are in fact under the hood RNNs


At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression.

You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling.

Mamba/RWKV/Griffin are somewhere between those two extremes.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: