The problem with this argument is that it's founded in how the AI is used, not h...

ceres · on Nov 1, 2022

Please let’s not start dictating how people should use a piece of software. It would be like ”regulating” Microsoft Word just because people might use it to duplicate copyrighted works.

lolinder · on Nov 1, 2022

I'm not saying we should regulate the software, I'm saying we need some rigorous method of ensuring that using the AI tools doesn't put you in jeopardy of accidental copyright infringement.

We most likely don't need new laws, because infringement is infringement and how you made the infringing work is irrelevant. Accidental infringement is already illegal in the US.

chii · on Nov 2, 2022

i would argue that we _do_ need new laws. AI generated code is so quite different from any other literary works - after all, it was not created by a human.

My own personal opinion is that the AI generated code (or pictures in the case of the article) should be under a new category of literary works, such that it does not receive copyright protection, but also does not violate existing copyright.

XorNot · on Nov 2, 2022

This is meaningless though. The majority of AI generated art you see out there is either hand tweaked or post-processed or both. There's human input involved and drawing a line is going to absolutely backfire.

chii · on Nov 2, 2022

if you presented both the generated image and the "original" to a jury of peers (or even a panel of experts in the field), they would be able to make a determination as to whether the generated image violated the copyright of the presented "original".

Humans tweaking the image is immaterial to this determination - if the human tweaked it so that it no longer seem to violate copyright, then that said panel would also make the same determination.

XorNot · on Nov 2, 2022

You are arguing that AI generated means no copyright protection. So you can't tweak it to "not violate copyright" because their literally isn't any.

Of course you have no way to prove whether any image was or was not generated by AI so welcome to a new scam for law firms to aggressively sue artists claiming they suspect AI was used in their works.

pulvinar · on Nov 2, 2022

The vast majority of paintings weren't created by a human either, but by a paintbrush. We should really ban those too. Just think of all the poor finger-painters who've been put out of a job!

joenot443 · on Nov 1, 2022

I think it's worth pointing out that Adobe has been doing this for a long time. You can't open or paste images into Photoshop which resemble any major currency.

andrewxdiamond · on Nov 1, 2022

> Copilot can produce code verbatim, but it doesn't unless you specifically set up a situation to test it.

It does not matter what a service can or cannot do. We do not regulate based on ability, but on action.

The service has an obligation to the license holders of the training data to not violate the license. The mechanism for which the license is violated is irrelevant. The only thing that matters is the code ended up somewhere it shouldn’t, and the service is the actor in the chain of responsibility that dropped the ball.

The prompting of the service is irrelevant. If I ask you to reproduce a block of GPL code in my codebase and you do it, you violated the license. It does not matter that I primed you or lead you to that outcome. What matters is the legally protected code is somewhere it shouldn’t be.

hnlmorg · on Nov 2, 2022

> It does not matter what a service can or cannot do. We do not regulate based on ability, but on action.

Whether we agree with it or not, intellectual property laws have historically been regulated by ability as well as action. Hence why blank multimedia formats would often have additional taxes in some jurisdictions just in case someone chose to record copyrighted content onto them. And why graphics cards used to include an MPEG royalty in their consumer cost, regardless of whether that user planned to watch DVDs on their computer.

Not saying I agree with this principle. Just that there is already a long history of precedence in this area.

Like a lot of politics, ultimately it just comes down to who has the bigger lobbying budget.

lolinder · on Nov 1, 2022

> If I ask you to reproduce a block of GPL code in my codebase and you do it, you violated the license. It does not matter that I primed you or lead you to that outcome. What matters is the legally protected code is somewhere it shouldn’t be.

This isn't accurate. If I reproduce GPL code in your codebase, that's perfectly acceptable as long as you obey the terms of the GPL when you go to distribute your code. In this hypothetical, my act of copying isn't restricted under the GPL license, it's your subsequent act of distribution that triggers the viral terms of the GPL.

The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data. If Copilot is derivative then Microsoft is infringing already. If Copilot is transformative then it is the responsibility of downstream consumers to ensure that they comply with the license of any code that may get reproduced verbatim. This question has not been ruled on, and it's not clear which direction a court will go.

leereeves · on Nov 1, 2022

> The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data.

Microsoft has a license to distribute the code used to train Copilot, and isn't distributing the Copilot model anyway, so it doesn't matter whether the model itself infringes copyright.

Whereas that same question probably does matter for Stable Diffusion.

lolinder · on Nov 2, 2022

Given that there's AGPL code in Copilot's training data, it does still matter if Copilot is derivative.

judge2020 · on Nov 2, 2022

Technically this GitHub license term means you grant an extra license to GitHub whenever you upload it:

https://docs.github.com/en/site-policy/github-terms/github-t...

As in " including improving the Service over time...parse it into a search index or otherwise analyze it on our servers" is the provision that grants them the ability to train CoPilot.

(also, in case you're wondering what happens if you upload someone else's code: "If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post.")

bombcar · on Nov 2, 2022

But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL, they can yell at you all they want but they will have to remove it, as nobody can break someone else's license for you.

It'll have to be tested in court, but likely nobody actually gives a shit.

judge2020 · on Nov 2, 2022

> But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL

Which is why that second provision is there to shift liability to you. You MUST have the ability to grant GitHub that license to any code you upload. If you don't, and MS is sued for infringing upon the GPL, presumably Microsoft can name you as the fraudster that claimed to be able to grant them a license to code that ended up in copilot.

paulryanrogers · on Nov 1, 2022

Microsoft is selling a service to put potentially copyrighted works into ones code, stripped of and disregarding its original license.

shagie · on Nov 2, 2022

How is that different from a consultant who indiscriminately copies from Stack Overflow?

Tangent to that is the "who gets sued and needs to fix it when a code audit is done?"

Ultimately, the question is then "who is responsible for verifying that the code submitted to production isn't copying from sources that have incompatible licensing?"

paulryanrogers · on Nov 2, 2022

The consultants would have to knowingly copy from somewhere. One can hope they're educated on licensing, at least if they expect to get paid.

If Microsoft is so confident in Pilot doing sufficient remixing then why not train it on their own internal code? And why put the burden of IP vetting on clients who have less info than Pilot?

chii · on Nov 2, 2022

> How is that different from a consultant who indiscriminately copies from Stack Overflow?

and how is that different from a student learning how to code off stackoverflow (or anywhere else for that matter), then reproducing some snippets/learnt code structure, in their employment?

shagie · on Nov 2, 2022

That's also an excellent example.

Or a random employee copies some art work that is then published ( https://arstechnica.com/tech-policy/2018/07/post-office-owes... ). You will note all the people that didn't get in trouble there - neither the photographer who created the image, nor Getty in making it available, nor the random employee who used it without checking its provenance.

In all of these cases, it is (or would be) the organization that published the copyrighted work without doing the appropriate diligence on checking what it is, if it would be useable, and how it should be licensed.

> The Post Office says it has new procedures in place to make sure that it doesn't make a mistake like this again.

... which is what companies who make use of AI models for generating content (be it art or code) should be doing to ensure that they're not accidentally infringing on existing copyrighted works.

drran · on Nov 2, 2022

Microsoft just calls others code as «public code». Public code is in public domain.

paulryanrogers · on Nov 2, 2022

Pilot is regurgitating snippets of code still under copyright and not in the public domain. Some may consider publicly available code fair use, but the fact that they're selling access for commercial use may undercut that argument.

irrational · on Nov 1, 2022

How would you regulate this?

lairv · on Nov 1, 2022

There is a part of Deep Learning research (Differential Privacy) which focuses on making sure an algorithm cannot leak information about the training set, and this is a rigorous concept, you can quantify how much privacy-preserving a model is, and there are methods to make a model "private" (at the cost of performance I think for now)

f_devd · on Nov 1, 2022

Differential Privacy only proves that it cannot leak a certain amount of information about individual samples of the training set. This only guarantees the input is not leaked exactly back, any composition of the training set is valid, although in image generation this usually means a very distorted image.

An example of DP in image generation (using GANs): https://par.nsf.gov/servlets/purl/10283631