> careful layering of well-understood optimizations—RoPE, SwiGLU, GQA, MoE They ...

Voloskaya · 2025-08-11T08:19:55 1754900395

You seem to be conflating when you first heard about those techniques and when they first appeared. None of those techniques were first seen in Qwen, nor this specific combination of techniques.

NitpickLawyer · 2025-08-11T07:38:30 1754897910

> They basically cloned Qwen3 on that

Oh, come on! GPT4 was rumoured to be an MoE well before Qwen even started releasing models. oAI didn't have to "clone" anything.

littlestymaar · 2025-08-11T11:45:57 1754912757

First, it would be great if people stopped top acting as if those billion-dollar corporations where sport teams.

Second, I don't claim OpenAI have to clone anything, and I have no reason to believe that their proprietary models are copying other people's ones. But for this particular open weight models, they clearly have an incentive to use exactly the same architectural base as another actor's, in order to avoid leaking too much information about their own secret sauce.

And finally, though GPT-4 was a MoE it was most likely what TFA calls “early MoE” with a few very big experts, not many small ones.