I think the situation is less clear than that. While I have limited research exp...

I think the situation is less clear than that. While I have limited research experience with image generation, I believe I do have a fair understanding of large language models. From the publication of GPT-2 until ChatGPT, it was true that the argument always was that supervised training data was not a priority and that it all boiled down to scaling the amount of unsupervised training data. However, this all changed with preference tuning, etc. and I think there is also an argument to be made that the extensive training data curation that we see today (and is withheld from the "papers" we see for the models) is a form of supervision in its own right. It could be that we will see computational/data scaling dominate again, but I think it is equally possible that we will have the next few years dominated by data curation and exploring forms of supervision to "extract" value out of what was learnt at the unsupervised training stage.

Still, you are correct that Extropic is looking at the computation rather than data. But, I wanted to chime in so as the discussion here would not leave the impression that we are still in the days of pure unsupervised scaling.