> If you can't process/digest copyrighted content with algorithms/machine learning then Google Search (the whole thing, not just Image Search) is dead.
Not if Google honors the robots.txt like they say they do. Hosting content with a robots.txt saying "index me please" is essentially an implicit contract with Google for full access to your content in return for showing up in their search results.
Hosting an image/code repository with a very specific license attached and then having that licensed ignored by someone who repackages that content and redistributes it is not the same as sites explicitly telling Google to index their content.
A much closer comparison IMO would be someone compressing a massive library of copyrighted content and then redistributing it and arguing it's legal because "the content has been processed and can't be recovered without a specific setup". I don't think we'd need prior court cases to argue that would most likely be illegal, so I don't see how machine learning models differ.
LAION/StableDiffusion is already legal under the same exemptions as Google Image Search and does respect robots.txt. It was also created in Germany so US court cases wouldn’t apply to it.
Not if Google honors the robots.txt like they say they do. Hosting content with a robots.txt saying "index me please" is essentially an implicit contract with Google for full access to your content in return for showing up in their search results.
Hosting an image/code repository with a very specific license attached and then having that licensed ignored by someone who repackages that content and redistributes it is not the same as sites explicitly telling Google to index their content.
A much closer comparison IMO would be someone compressing a massive library of copyrighted content and then redistributing it and arguing it's legal because "the content has been processed and can't be recovered without a specific setup". I don't think we'd need prior court cases to argue that would most likely be illegal, so I don't see how machine learning models differ.