Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think anyone is accusing AI models of distributing copyrighted works verbatim, so any argument will have to focus on AI derivative works, not original ones.

But if I understand you correctly, you're complaining that the data OpenAI (for example) downloaded from the internet and presented to GPT4 does not count as legally acquired? Why not? It was downloaded from the internet so I think that implies it did not violate any license on OpenAI's part. Saving it for a long time might be in the grey zone, but generally that is accepted, when it comes to humans, either as fair use, or a technical necessity (such as caching).



"I don't think anyone is accusing AI models of distributing copyrighted works verbatim"

They do that, too. They've been caught, reported on, and lawsuits are in progress. I have piles of verbatim quotes from them about certain material. I was actually using ChatGPT partly for that research since I thought the (free) source was legally clear. Later, I found out it was against their highly-readable license. OpenAI had taken their work without permission against their license terms. I deleted all my GPT artifacts. That's all I can say about that one.

"But if I understand you correctly, you're complaining that the data OpenAI (for example) downloaded from the internet and presented to GPT4 does not count as legally acquired?"

Why was in the article I shared. This section has specific claims on their data:

https://gethisword.com/tech/exploringai/provingwrongdoing.ht...

The books in GPT, BooksCorpus2 in The Pile, the papers that forbid commercial use (eg some in Arxiv), corporate media's articles, and online resources used outside the permissions are easy examples. Basic, copyright law says you have to obey certain principles when using published works. They were ignoring all of them.

Most file-sharing cases also say you can't distribute copyrighted works without the authors' permission. Even free ones since they're often free on sites that support the authors, like with ads or publicity. They're (a) passing collections of such material around which is already illegal and (b) in ways that only benefit them, not the authors.

When tested for copyright infringement, one thing they look at is who gets value out of the situation. Did they take away the value, esp financial, that the author would get from their work in their own use of the work? Are they competition? That ChatGPT's answers replaced a lot of their users' use of source material says that might be a yes. And does the new work exist to make a profit or for non-commercial use? Most of them sell it with OpenAI and Anthromrophic making billions off others' copyrighted works. Definitely yes. Do they ignore others copyright and contract rights while asserting their own? Yes, hypocrites indeed.

Even a junior lawyer would warn you about most of these risks. They're commonly used in copyright cases. The only way they could fail almost across the board is if they were doing it on purpose for money, power, and fame. If so, they deserve to experience the consequences of those actions.

Also, let's not pretend the folks getting billions of dollars for AI development couldn't have paid some millions here and there for legal data. Their own research says high-quality data would've made their AI's perform better, too. Greed was working against everyone's interests here if their interests were what they say (public-benefit AI).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: