Yes, like RWKV and Mamba this is a new generation of models that are more like b...

stri8ed · on April 11, 2024

Isn't that how previous models were, before the attention is all you need paper?

boywitharupee · on April 11, 2024

and is Griffin a state space model?

imjonse · on April 11, 2024

No, it's a combination of RNN and Transformer.

michwilinski · on April 11, 2024

I mean, SSMs are in fact under the hood RNNs

VHRanger · on April 11, 2024

At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression.

You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling.

Mamba/RWKV/Griffin are somewhere between those two extremes.