There are no benchmarks on the 8B & 14B models, the most popular on consumer hardware. Are they hiding something? Did anyone benchmark them?
And why did they hide the generalist benchmarks like MMLU-pro & TruthfulQA?
I wish we had proper public benchmarks that are up to date. LMarena was proven useless by the Llama4 scandal, and LiveBench is unrealistic and misses too many models.
And why did they hide the generalist benchmarks like MMLU-pro & TruthfulQA?
I wish we had proper public benchmarks that are up to date. LMarena was proven useless by the Llama4 scandal, and LiveBench is unrealistic and misses too many models.