Everyone is so negative here but we have reached the limit of AI scaling with conventional methods. Who knows Mistral might find the next big breakthrough like DeepSeek did. We should be optimistic.
> but we have reached the limit of AI scaling with conventional methods
We've just only started RL training LLMs. So far, RL has not used more than 10-20% of the existing pre-training compute budget. There's a lot of scaling left in RL training yet.
It was true for models up to o3, but there isn't enough public info to say much about GPT-5. Grok 4 seems to be the first major model that scaled RL compute 10x to near pre-training effort.
Even with pretraining, there's no limit or wall in raw performance, just diminishing returns in terms of the current applications, and business rationale to serve lighter models given the current infrastructure and pricing (and applications). Algorithmic efficiency of inference on a given performance level has also advanced a couple of OOMs since 2022 (for sure a major part of that is about model architecture and training methods).
And it seems research is bottlenecked by computation.
RLHF is not the "RL" the parent is posting about. RLHF is specifically human driven reward (subjective, doesn't scale, doesn't improve the model "intelligence", just tweaks behavior) - which is why the labs have started calling it post-training, not RLHF, anymore.
True RL is where you set up an environment where an agent can "discover" solutions to problems by iterating against some kind of verifiable reward AND the entire space of outcomes is theoretically largely explorable by the agent. Maths and Coding are have proven amenable to this type of RL so far.
> It’s hard to believe that Mistral isn’t the right choice to invest €1.7B in for economic reasons.
Why? Cursor, essentially a VSCode fork, is valued at $10B. Perplexity AI, which, as far as I'm informed, doesn't have its own foundational models, boasts a market capitalisation of $20B, according to recent news. Yet Mistral sits at just a $14B.
Meanwhile, Mistral was at the forefront of the LLM take-off, developing foundational (very lean, performant and innovative at the time) models from scratch and releasing them openly. They set up an API service, integrated with businesses, building custom models and fine-tunes, and secured partnership agreements. They launched user-facing interface and mobile app which are on par with leading companies, kept pace with "reasoning" and "research" advancements; and, in short, built a solid, commercially viable portfolio. So why on earth should Mistral AI be valued lower? Let alone have its mere €1.7B investment questioned.
Edit: Apologies, I misread your quote and missed the "isn't" part.
i recall them being one of the first ones to release a mixture-of-experts (MoE) model [1], which was quite novel at the time. post that, it has appeared to be a catch-up game for them in mainstream utility. like just a week go they announced support for custom MCP connectors to their chat offering [2].
more competition is always nice, but i wonder what can these two companies, separated by several steps in the supply chain, really achieve together.
I am not a ML person but as per the broad level understanding the innovation was about efficient training method and training the model in much cheaper than the US models and it was dubbed as the "Sputnik moment".