Sure, if you build your own model, train it on copyrighted works, then use it to...

shadowgovt · on Nov 1, 2022

In what way is the model chock full of unlicensed material? It was trained on unlicensed material, but I don't think you're ever going to be able to find a forensic auditor who can tease individual works out of the weights in a model.

You can't reasonably assert that a model encodes individual works of copyrighted material in any way meaningful for copyright. Not without a change to the law.

manholio · on Nov 1, 2022

Obfuscation is not a valid defense against copyright infringement. If my database contains full encrypted copies of unlicensed works and I distribute keys to my customers for parts of those works, no forensic auditor will ever prove the full extent of my infringement without learning the full keyset. But I would argue that reproduction of even a single instance of an unlicensed non trivial fragment of a copyrighted work would taint that entire database.

In the same way in AI a crafted prompt that creates striking similarities to a well known work, like this example here, is suficient proof that the model embeds unlicensed works; using a copyrighted work for training models is just another form of commercial exploitation that the original author should be compensated for.

shadowgovt · on Nov 2, 2022

But we're not talking about obfuscation; we're talking about the data you're describing not being there. If you ask the AI to spit out a 1-for-1 copy of Hollie Mengert's work, it can't. I suspect it can't spit out coherent individual pieces of it either (I might be wrong in that assertion, as I haven't run this Stable Diffusion). It spits out content in her style.

You generally cannot copyright a style.

If it spits out entire chunks of pre-existing works, that's an entirely different story; but what it seems to do is (via the learning and subsequent diffusion process) receive an input like "Wonder Woman on a hill" and (to falsely anthropomorphize a giant math puzzle) say "I know what a Wonder Woman looks like, and I know that 'correct pictures' have some certain ratios of straight lines and angles and tend to use some particular color triplets, so I'm biasing the thing that matches to my 'Wonder Woman' shape structure with those lines, angles, and colors." The result is a picture Hollie Mengert has never drawn, which an observer could assume is done by her because the style is so spot-on.

And aping an artist's style is not illegal for humans and we have no law to make it illegal for machines. Should it be illegal is an interesting question, but it will require new law to make it so.

manholio · on Nov 3, 2022

I'm not claiming the problem is identical to pre-existing problems in the copyright space, just that it's sufficiently similar not to pose a significant challenge for legal scholars, IMHO. Existing copyright laws not only forbid verbatim reproduction, but require derivative works too not prejudice the original author, and grants those authors the power to authorize and reject derivation: https://en.wikipedia.org/wiki/Derivative_work

You anthropomorphic analogy falls flat in its face because the algorithm does not "know" anything, not in any sense of the "know" word for sentient and rational creatures. The algorithm embeds an association between the text "Wonder woman" and actual artistic representations of Wonder woman included in the prior art it is trained on. When prompted, it can reproduce one (see the Copilot fail where it spited out verbatim copyrighted code including comments) or a remix of such representations and integrate them into the output. That's plain as day a derivative work.

The particular case you are referring too, style extraction, could be considered fair use assuming you can technically separate the base visual model from the output style and you can prove the training data for the output module is distilled into abstract, statistical quantities pertaining to that style, such as color palete, stroke weight etc. That sounds like a tall order and I would consider any AI-model trained with copyrighted works as tainted until that burden of proof is satisfied.

shadowgovt · on Nov 3, 2022

Isn't the fact that it can faithfully simulate, in the style of the author, works the author has never created proof enough that the style is disjoint from the trained content?

Hollie Mengert never rendered the streetscape in the article, but DreamBooth did it in her style.

If we're talking criminal copyright infringement, why is the burden of proof on the defendant to show statistical abstraction if the plaintiff can't prove the AI generates works she has made? (Again, if it is possible to get DreamBooth to kick out Hollie's original work, or substantial portions of it, I'd be inclined to agree with your way of thinking, but I haven't seen that yet).

> embeds an association between the text "Wonder woman" and actual artistic representations of Wonder woman included in the prior art it is trained on

Not if I understand how it works correctly, no; it does not. In fact, Mengert's rendering of Wonder Woman differs from the one DreamBooth kicked out if you look up the work she's done for "Winner Takes All! (DC Super Hero Girls)". This is because DreamBooth's approach is to retrain Stable Diffusion with new information but preserve the old; since Stable Diffusion already had an encoding of what Wonder Woman looked like from a mélange of sources, its resulting rendering is neither Mengert's nor the other sources, but a synthesis of them all.