We have been in the 'magic scaling' era for a while now. While the basic architecture of language models is reasonably simple and well understood, the emergent effects of making models bigger are largely magic even to the researchers, only to be studied emperically after the fact.