I don't know if it's only good of academics, the point as the paper (as it says) is a scaling law for optimal loss given a fixed compute budget. By design it doesn't address inference costs and isn't a recipe for "how you should train a LLM for your use case".
If you're serving LLMs in a low throughput high cost scenario optimizing loss at the expense of inference cost may very well be your goal, or if you cant pay up front for 25x compute.
I don't know if it's only good of academics, the point as the paper (as it says) is a scaling law for optimal loss given a fixed compute budget. By design it doesn't address inference costs and isn't a recipe for "how you should train a LLM for your use case".
If you're serving LLMs in a low throughput high cost scenario optimizing loss at the expense of inference cost may very well be your goal, or if you cant pay up front for 25x compute.