> I don't know what you mean by timing (relative to what?) or "simple indexing" (they scanned the complete contents of books), but I am, and was already aware, of the wiki article and the role of recaptcha.
Timing as in: it started in 2004, when the most advanced AI most people used was a spam filter, so it wasn't seen as a training issue (in the way that LLMs are) *at the time*.
As for training rights, I agree with you, there's no clarity for how such data could be used *today* by the people who have it. Especially as the arguments in favour of LLM training are often by comparison to search engine indexing.
Timing as in: it started in 2004, when the most advanced AI most people used was a spam filter, so it wasn't seen as a training issue (in the way that LLMs are) *at the time*.
As for training rights, I agree with you, there's no clarity for how such data could be used *today* by the people who have it. Especially as the arguments in favour of LLM training are often by comparison to search engine indexing.