Hacker News new | past | comments | ask | show | jobs | submit login

Seems to be an untested legal question but it's definitely presumptuous on the part of AI researchers to assume that it is fine. Maybe using the dataset to train an AI is fine but I'd say it's pretty clear that distributing the training dataset isn't.

Nobody has sued yet though.




>Maybe using the dataset to train an AI is fine

If this is true, how much longer until we start purposefully training AI models to be overfit and start returning inputs verbatim?


I think a reasonable first guess for a legal interpretation is "would it be legal if a human did that?"

If you read an encyclopedia and use that knowledge to answer questions, a human isn't violating copyright, so an AI doing that is probably fine. If you look at all of Picasso's paintings and paint something in his style that isn't violating copyright, so training an AI to make Picasso-like paintings by training it on real Picassos is probably fine.

However looking at a painting and perfectly replicating it is a copyright violation. Same for reading a text and then writing down the same text. Both of these are just copies, not new works. So intentionally overfitting an AI to have it return the inputs with minimal changes probably makes the outputs subject to copyright from the owners of the training data.


We're already there. Copilot already does this, as has been found with various GPL violation probes.

Whether or not it's illegal is still an ongoing debate. FSF says it absolutely is illegal, but I think it's ultimately going to end up in court (though I'm just speculating).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: