Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

These tiny “state of the art” performance increases are really indicative the current architecture for LLM(Transformers + Mixture of Experts) is maxed out even if you train it more/differently. The writings are on all over the walls.


It would not surprise me if this is what has delayed OpenAI in releasing a new model. After more than a year since GPT-4, they may have by now produced some mega-trained mega-model, but running it is so expensive, and its eval improvement over GPT-4 so marginal, that releasing it to the public simply makes no commercial sense just yet.

They may be working on how to optimize it to reduce cost, or re-engineer it to improve evals.


These “state of the art” llm barely eking out a win isn’t a threat to OpenAI and they can take their sweet time sharpening sword that will come down hard on these LLMs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: