Hacker News new | past | comments | ask | show | jobs | submit login

I suspect ChatGPT is using a form of clean-room design to keep copyrighted material out of the training set of deployed models.

One model is trained on copyrighted works in a jurisdiction where this is allowed and outputs "transformative" summaries of book chapters. This serves as training data for the deployed model.




The article describes how the deployed model can regurgitate chunks of copyrighted works - one of the samples literally ends in a copyright notice.


If these were copyrighted works, how did these end up in the public comparison dataset?

Sure, some copyrighted works ended up in the Pile by accident. You can download these directly, without the elaborate "poem" trick.


That sounds like copyright washing if there is such thing.


If that's copyright washing so are Cliff's Notes.


Yup, though a lot of people are acting now as though every already-established principle of fair use needs to be revised suddenly by adding a bunch of "...but if this is done by any form of AI, then it's copyright infringement."

A cover band who plays Beatles songs = great An artist who paints you a picture in the style of so-and-so = great

An AI who is trained on Beatles songs and can write new ones = exploitative, stealing, etc. An AI who paints you a picture in the style of so-and-so = get the pitchforks, Big Tech wants to kill art!


> A cover band who plays Beatles songs

Has to pay the Beatles for the pleasure of doing so.


This discussion about art "in the style of" being stealing or exploitative hasn't started with AI. For quite some time there has been complaints of advertisements commissioning sound-alike tunes to avoid paying licensing. AI is only automating it and making it possible in an industrial scale.


Well, I don't know about that. I strongly suspect chatgpt could deliver whole copyrighted books piece by piece. I suspect that because it most certainly can do that with non-copyrighted text. Just ask it to give you something out of the Bible or Moby Dick. Cliff Notes can't do that.


Why would you suspect that?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: