Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In what sense does copyright make attribution of that data so hard?

Is it because people are violating copyrights to train these AIs?



They can't publish their training databases because that would be publishing of copyrighted material which is illegal. They can only train which is potential fair use.


It would be more accurate to say that they don't publish their training databases (including sanitized pointers to the copyrighted stuff) because they aren't sure that training is fair use.

They are sure, however, that it is a kind of infringement. Citing "fair use" is an admission of infringement - just a specific kind of infringement that is allowed.


Publishing is definitely not fair use. What's allowed in not an infringement of any kind. Using a right you have cannot be any kind of infringement.


How can training violate copyright? Is reading a book also violation? My understanding was that copyright was about reproduction, not consumption.


It was about unfairly compensated usage, not limited to reproduction.


That doesn’t sound like copyright.


fair enough. It might be better to use other word.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: