In what sense does copyright make attribution of that data so hard? Is it becaus...

scotty79 · on Feb 17, 2024

They can't publish their training databases because that would be publishing of copyrighted material which is illegal. They can only train which is potential fair use.

pclmulqdq · on Feb 17, 2024

It would be more accurate to say that they don't publish their training databases (including sanitized pointers to the copyrighted stuff) because they aren't sure that training is fair use.

They are sure, however, that it is a kind of infringement. Citing "fair use" is an admission of infringement - just a specific kind of infringement that is allowed.

scotty79 · on Feb 18, 2024

Publishing is definitely not fair use. What's allowed in not an infringement of any kind. Using a right you have cannot be any kind of infringement.

brookst · on Feb 16, 2024

How can training violate copyright? Is reading a book also violation? My understanding was that copyright was about reproduction, not consumption.

testermelon · on Feb 16, 2024

It was about unfairly compensated usage, not limited to reproduction.

brookst · on Feb 17, 2024

That doesn’t sound like copyright.

testermelon · on Feb 17, 2024

fair enough. It might be better to use other word.