How long does it generally take between model architectures like Mamba being pro... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		uejfiweun on Dec 20, 2023 \| parent \| context \| favorite \| on: Implementation of Mamba in one file of PyTorch How long does it generally take between model architectures like Mamba being proposed and the use of these architectures in SotA mega models like GPT or Gemini? IIUC Mamba basically eliminates restrictions on context length which would be awesome to see in the super-mega high performance models.

brcmthrowaway on Dec 20, 2023 [–]

GPT-5 would have this enhancement

MacsHeadroom on Dec 21, 2023 | [–]

GPT-5 will not, because the T in GPT stands for Transformer and Mamba/SSMs/S6 are not Transformers.

But I would bet that we see a SOTA S6 LLM from Meta by this Spring.

brcmthrowaway on Dec 22, 2023 | | [–]

S6?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact