If you could somehow download the entire archive you could feed it into your LLM... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

mannyv on Oct 4, 2023 | parent | context | favorite | on: 1.3B Worldcat scrape and data science mini-competi...

If you could somehow download the entire archive you could feed it into your LLM for training. This is a huge corpus and is sort of ill-gotten. That said, it would be pretty awesome.

Google has this sort of thing already, since they have that whole "let's digitize the world's books" project. Interesting as to why google never developed a ChatGPT, given that they literally have a large amount of the world's books digitized.

crtasm on Oct 4, 2023 [–]

Google launched Bard earlier this year.

mannyv on Oct 4, 2023 | [–]

Yes, but why weren't they first?

wordpad25 on Oct 4, 2023 | | [–]

It's like asking why wasn't X invented earlier.

Google and everyone else had no idea how successful LLMs could be until OpenAI did it.

mannyv on Oct 5, 2023 | | [–]

Untrue. The whole transformer idea came from the goog team

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact