Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you read their blog post, they mention it was pretrained on 12 Trillion tokens of text. That is ~5x the amount of the llama2 training runs.

From that, it seems somewhat likely we've hit the wall on improving <X B parameter LLMs by simply scaling up the training data, which basically forces everyone to continue scaling up if they want to keep up with SOTA.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: