Meta has something like half a million modern GPUs that they’ve purchased outright. They can afford to keep training models “forever”.
This is useful to eke out every last drop of quality per gigabyte of model file size. It also keeps the models up to date with current events.
Obviously this scaling becomes too inefficient at infinite scale not just because of training costs that’ll never be recouped but also increasing inference cost with larger models.
Some fundamentally new architectures will need to be developed to take much better advantage of increased computer power.
I suspect the major players are investing in hardware now in the hope that some revolutionary new algorithm is invented soon and they’ll be ready for it.
This is useful to eke out every last drop of quality per gigabyte of model file size. It also keeps the models up to date with current events.
Obviously this scaling becomes too inefficient at infinite scale not just because of training costs that’ll never be recouped but also increasing inference cost with larger models.
Some fundamentally new architectures will need to be developed to take much better advantage of increased computer power.
I suspect the major players are investing in hardware now in the hope that some revolutionary new algorithm is invented soon and they’ll be ready for it.
It’s… a bit of a gamble!