> Chinchilla scaling was only good for academics I don't know if it's only good ...

> Chinchilla scaling was only good for academics

I don't know if it's only good of academics, the point as the paper (as it says) is a scaling law for optimal loss given a fixed compute budget. By design it doesn't address inference costs and isn't a recipe for "how you should train a LLM for your use case".

If you're serving LLMs in a low throughput high cost scenario optimizing loss at the expense of inference cost may very well be your goal, or if you cant pay up front for 25x compute.