Hacker News new | past | comments | ask | show | jobs | submit login

AI training on AI-generated content is a future problem. Using textbooks is a good idea, until our textbooks are being written by AI.

This problem can't really be avoided once we begin using AI to write, understand, explain, and disseminate information for us. It'll be writing more than blogs and SEO pages.

How long before we start readily using AI to write academic journals and scientific papers? It's really only a matter of time, if it's not already happening.




You need to separate “content” and “knowledge.” GenAI can create massive amounts of content, but the knowledge you give it to create that content is what matters and why RAG is the most important pattern right now.

From “known good” sources of knowledge, we can generate an infinite amount of content. We can add more “known good” knowledge to the model by generating content about that knowledge and training on it.

I agree there will be many issues keeping up with what “known good” is, but that’s always been an issue.


> We can add more “known good” knowledge to the model by generating content about that knowledge and training on it.

That's my entire point -- AI only generates content right now, but it will also be the source of content for training purposes soon. We need a "known good" human knowledge-base, otherwise generative AI will degenerate as AI generated content proliferates.

Crawling the web, like in the case of the OP, isn't going to work for much longer. And books, video, and music are next.


> Crawling the web, like in the case of the OP, isn't going to work for much longer. And books, video, and music are next.

That is training on content.

The future will have models pre-trained on content and tuned on corpuses of knowledge. The knowledge it is trained on will be a selling point for the model.

Think of it this way - if you want to update the model so it knows the latest news, does it matter if the news was AI generated if it was generated from details of actual events?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: