If you could somehow download the entire archive you could feed it into your LLM for training. This is a huge corpus and is sort of ill-gotten. That said, it would be pretty awesome.
Google has this sort of thing already, since they have that whole "let's digitize the world's books" project. Interesting as to why google never developed a ChatGPT, given that they literally have a large amount of the world's books digitized.
Google has this sort of thing already, since they have that whole "let's digitize the world's books" project. Interesting as to why google never developed a ChatGPT, given that they literally have a large amount of the world's books digitized.