Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How long does it generally take between model architectures like Mamba being proposed and the use of these architectures in SotA mega models like GPT or Gemini? IIUC Mamba basically eliminates restrictions on context length which would be awesome to see in the super-mega high performance models.


GPT-5 would have this enhancement


GPT-5 will not, because the T in GPT stands for Transformer and Mamba/SSMs/S6 are not Transformers.

But I would bet that we see a SOTA S6 LLM from Meta by this Spring.


S6?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: