What did Zuck mean that Llama 4 Behemoth is already the highest performing base ...

		EGreg 6 months ago \| parent \| context \| favorite \| on: SeedLM: Compressing LLM Weights into Seeds of Pseu... What did Zuck mean that Llama 4 Behemoth is already the highest performing base model and hasnt even done training yet? What are the benchmarks then? Does he mean they did pretraining but not fine tuning?

You can fine tune a checkpoint of model during pre-training.