When I was doing some NLP stuff a few years ago, I downloaded a few blobs of Com...

dr_dshiv · 2025-02-02T09:30:33 1738488633

Yes, but the PHI-1 textbooks were synthetic — written by other models! So…

astrange · 2025-02-03T12:41:39 1738586499

You can get rid of the trivia by training one model on the slop, then a second model on the first one - called distillation or teacher-student training. But it's not much of a problem because regularization during training should discourage it from learning random noise.

The reason LLMs work isn't because they learn the whole internet, it's because they try to learn it but then fail to, in a useful way.

If anything current models are overly optimized away from this; I get the feeling they mostly want to tell you things from Wikipedia. You don't get a lot of answers that look like they came from a book.