Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Data Laundering (waxy.org)
304 points by marceloabsousa on Oct 17, 2022 | hide | past | favorite | 113 comments


The Authors Guild v Google decision about Google Books seems relevant:

> In late 2013, after the class action status was challenged, the District Court granted summary judgement in favor of Google, dismissing the lawsuit and affirming the Google Books project met all legal requirements for fair use. The Second Circuit Court of Appeal upheld the District Court's summary judgement in October 2015, ruling Google's "project provides a public service without violating intellectual property law." The U.S. Supreme Court subsequently denied a petition to hear the case.

[...]

> The court's summary of its opinion is:

[...]

> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals. Google’s commercial nature and profit motivation do not justify denial of fair use.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

This doesn't touch on the ethics of course – at minimum I think allowing people to exclude themselves or their work from a dataset is necessary.


I would argue (as the court did) that google's use is transformative because the end result "book search" is in a different marketplace from "books." The end result / output of these generative AI systems trained on stock media and art is..."stock media and art."

That's kind of what this whole article is about. Just training the systems in research is arguably fair use but creating the entire pipeline might not be and the "loophole" here is trying to claim no responsibility for the training at the center of it because that was technically done by a 3rd party (...funded by the final creator of the full entire pipeline.)


The court’s summary also mentions this aspect of differing marketplaces:

“… the revelations [i.e. the information served by Google Book Search] do not provide a significant market substitute for the protected aspects of the originals.”

This doesn’t apply to AI image generators which are clearly a “market substitute” for the protected originals used to train the system. For this reason I’d expect someone like Getty to want to revisit Authors Guild v Google sooner rather than later.


Can we first get an AI that's actually usefull as a Getty substitute? All I'm seeing posted is visually pleasing nonsense - as soon as I tried to use it for stuff like stock photo generator it's unusable (eg. key physical properties of the object would be off to the point where the object is useless, and in many cases it would look wrong even from a thumbnail).

The only thing I did see was designers cropping out the wrong part and filling in the blanks - I suppose it's competing with stock photos in that aspect.


Did you try inpainting to fix the wrong bits? From my very little experience, AI image generation is not (yet) a one click process and requires multiple iterations to get close to the desired result, i.e. it is still more a tool for a designer than a replacement for it.


AI image generators are clearly not a market substitute for images, they are a tool that can be used to create market substitutes, but not themselves one.


Depends on what the product is. For example with openAI Dalle-2 the product is very clearly the generated image. You even pay per image. Also this is kind of what this article is about. Arbitrarily separating the pieces in order to evade copyright.


Note the "protected aspects of the originals" part. AI generated images don't produce outputs that contain protected aspects.


That’s for a court to decide, ultimately. Something doesn’t have to be a bit for bit copy to be a protected aspect.


An important part of the opinion (on the wiki page you linked to) is completely missing in the case of AI datasets:

> It generates new audiences and creates new sources of income for authors and publishers.

This is definitely not the case for artists and photographers, who don't benefit at all from the transformative nature of the AI output, and in fact are significantly harmed since it dilutes the uniqueness of their work by allowing anyone to imitate their style. Though to my knowledge "style" isn't protected by copyright - only trademark - I can't imagine there won't be lawsuits about this in the future.

That one artist who complained that people can't find his original work online now because of so many imitated pics is definitely exhibit A in terms of direct harm.


> the revelations do not provide a significant market substitute for the protected aspects of the originals

It does seem like generative AI systems provide a significant market substitute, so this ruling probably wouldn’t apply, in court.

edit: see https://news.ycombinator.com/item?id=33194623 for some initial thoughts on how this problem (and others) could be rectified.

For example, with a database of protected works and self-censorship algorithms for generative AI systems, conscientiously objecting creatives could have a mechanism for excluding their works.


A substitute for what though? Copyright law is only concerned with substituting the work under copyright. That is to say, the consideration is whether the infringing aspects of the secondary work would alter the demand and market for the work being infringed.

In all the talk about AI data laundering there really hasn't been any indication that the AI generated item substitutes for the item it's alleged to infringe on. Substituting for a whole profession and its practitioners doesn't enter into the concerns of copyright law. There might be some argument that it should (to "promote the progress of science and useful arts" as it were), but copyright law to my knowledge hasn't been used to prevent new tech from putting professionals as a whole out of business.


Stock photography seems to be the obvious instance - why bother paying for the labor to make a stock photo, when you can have a generative AI system create the photo for you?

And furthermore, has anyone demonstrated that it is or is not possible to fully, or substantially, recreate any given existing work using the right input prompts?

I’m interested to know more of the legal details, but my understanding of copyright law is such that it preserves the value of intellectual labor.

edit: on a certain level, the cat is already out of the bag, but that doesn’t mean that we should ignore the law, without some indication from lawmakers or government that they intend to adjust said laws


This is precisely my point though, "stock photography" isn't an individual work. Copyright law doesn't apply because you can't infringe on the copyright of "stock photography" as a whole any more than you can infringe on the copyright of "rap" or "rock and roll" or "oil paintings".

Further, just because a new tech can substitute for a class of old tech doesn't (often, barring protectionist laws) mean the old tech gets to impose legal restrictions on new tech. To trot out an obligatory car analogy, the rise of the automobile was not legally hampered by the fact that it substituted for the products of buggy and whip makers. More relevantly, the rise of photography and cameras was not legally hampered by the fact that it substituted for many painters products. The rise of stock photography itself wasn't legally hampered by the fact it substituted for the work of corporate artists. The rise of point and shoot cameras wasn't legally hampered because it substituted for the work of professional photographers.

Lastly if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too. The same arguments were made about Xerox machines, about cassette tapes, about VCRs, about CD-Rs. The copyright holders pretty much lost in every case. At the point you are taking specific and deliberate actions to knowingly infringe on copyright is the point where the technology used is no longer relevant. The right inputs can be used to infringe on the copyright of Star Wars at any typewriter or computer keyboard in the world. The right inputs can be used to infringe on the copyrights of The Beatles at virtually any instrument. It is the act of infringing, not the technology, which is relevant here.

I believe in some companies, the copyright holders won a concession in the form of a tax levied against each CD-R and cassette tape sold to be distributed to the recording industy. One wonders how the authors of those countries felt about not getting a cut of every Xerox machine sold.


> Lastly if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too. The same arguments were made about Xerox machines, about cassette tapes, about VCRs, about CD-Rs. The copyright holders pretty much lost in every case. At the point you are taking specific and deliberate actions to knowingly infringe on copyright is the point where the technology used is no longer relevant. The right inputs can be used to infringe on the copyright of Star Wars at any typewriter or computer keyboard in the world. The right inputs can be used to infringe on the copyrights of The Beatles at virtually any instrument. It is the act of infringing, not the technology, which is relevant here

There is a significant difference here though, a Xerox machine or a VCR itself does not contain a representation of the art they are copying, a DL network does. I am pretty certain the cases around Xerox/VCRs etc would have had a pretty different outcome if you could type a prompt into your machine "print a story about some kids in a wizard college fighting against the comeback of an evil sorcerer" and it would have put out something closely resembling Harry Potter.


> This is precisely my point though, "stock photography" isn't an individual work … It is the act of infringing, not the technology, which is relevant here.

I completely agree.

See https://news.ycombinator.com/item?id=33241173 for my comments on that topic. (edit: self-censorship, for example, can be extended to generative AI systems)

> if the argument is about that the tech makes it "possible to fully or substantially recreate any given existing work" using deliberate and specific inputs, well we've had plenty of legal precedent on that too.

In this case, however, given that we are talking about a computer program, and as such, there are ways to legislate copyright protection (or other protections) without throwing the baby out with the bathwater.

See https://news.ycombinator.com/item?id=33194623 for some initial thoughts on that topic


So the value proposition is, it’s the exact same thing, but you won’t be paying for it because it’s not THE exact same thing?


As I understand it, that would be skirting the law and philosophical principles behind protectionism for intellectual labor.

If society doesn’t value commodity intellectual labor, then society may need to address the commoditization of intellectual labor, directly, through things like UBI / vocational rehabilitation, etc.

Similar arguments can be made about robots and the commoditization of manual labor.


> Google’s unauthorized digitizing of copyright-protected works, creation of a search functionality, and display of snippets from those works are non-infringing fair uses. The purpose of the copying is highly transformative, the public display of text is limited, and the revelations do not provide a significant market substitute for the protected aspects of the originals.

So is digitizing a copyright vhs and hosting it via torrents also fair use? Its transformative, the public display of the video is limited, there is no market for vhs.

I don't get it whats the difference other than Google having deeper pockets than me?


> I think allowing people to exclude themselves or their work from a dataset is necessary.

or they could open it all up for everybody and stop protecting the rights of death people (authors dead less then 70 years ago)

then again, that will make the publishers starve... but why pretend publishing corporations need food?


My personal ideal outcome is that there's no opting out of having your intellectual output included in the training, but the resulting model is as a result available freely to the public.

In my utopia, the end results are models containing the sum total of human output, available to everyone.

What I think is unconscionable is training the models on public works and then retaining them exclusively for private use.


Why pretend that other corporations that vacuum up content and repackage it have rights to resell art that you want to strip from the original publishers? At least the publishers actually made a contract with the artists.


For the first time there is a chance for Mickey Mouse to be free, I mean "In-the-style-of-Mickey-Mouse", his new name. When did we ever get such a chance for information freedom?


The Renaissance comes to mind: https://en.m.wikipedia.org/wiki/Renaissance


This is larger than publishers, this is every artist, film-maker, photographer, every writer, every engineer, anybody who has ever written or created something and shared it publicly is liable to have their work assimilated and an infinite amount of derivatives produced with no control over how they're used and by whom.

Comment generated with gpt-neox prompt: Comment about AI and data collection and generation and its pitfalls, expressing concern, emphasis on professions, emphasis on automation, written by Stephen King, creative writing, award winning, trending on reddit, trending on hacker news, written by Greg Rutkowski, written by Zola, written by Voltaire, written by authpor, written by moyix.

(Just kidding, it wasn't AI generated but you see my point.)


> anybody who has ever written or created something and shared it publicly is liable to have their work assimilated and an infinite amount of derivatives produced with no control over how they're used and by whom.

This has been the case ever since people started putting their art on the Internet publicly. The only difference is that now it's algorithms creating the derivatives, not people.


Yeah before the internet it never happened and nobody knew just how damn cliched Bill Shakespeare's plays are. Every line of Hamlet's soliloquy! It's insane!


Would we have Shakespeare's plays if he didn't make money? Which encourages better plays:

I write a play, and I can license theater companies to be able to perform it. Therefor better writers are attracted to the industry (instead of to say Ad Copywriting) and because of a higher level product, the industry thrives.

I can write a passion play that the local theater will perform. I will not generate enough income to live from my product. I will not generate income from licensing my production because there is no copyright and my scripts would just get stolen/distributed freely. The industry has less quality productions. The majority of productions have no reputation of quality.


This is not remotely the same, scale and barrier to entry matter. With stable diffusion I can pick any artist right now and create over 1000 derivative works by tomorrow morning in his style to the same degree of expertise with no training involved and no work required.


That's good!

Acting like it's a bad thing is just ludditery.


The Luddites weren't some cult of ignorant technophobes, they were highly-skilled middle class craftsmen and small business owners who went from being able to provide for their families to dying in utter destitution. The remainder of them were tried for machine breaking and were either executed by the state or exiled to penal colonies. They risked everything because everything was at stake, I have a hard time saying that their situation and outcomes were "good", and I have a hard time saying the same about similar situations that are playing out today.


Certainly, but automation is what allows for improvement to the whole.

Today, clothing is cheap and plentiful, along with bedding, curtains, towels and other cloth materials. Clothing would be outrageously expensive if everything were still hand spun, hand loomed, hand cut, hand sewn, and hand screened.

If the human computers[1] that predated the rise of the machine computer had done the same and won, it would have certainly been a boon for them at the time as well, but to the loss of all information technology developed hence.

The lack of a social safety net lent desperation to the luddites. Had they not faced imminent ruin and starvation as the machines eclipsed their occupation, they may not have had need to rebel against the newly emerging textiles mechanization.

AI may eventually replace traditional artists in many situations. Surely with simple images today, we can expect video examples in the future, and interactive AI generated simulations some time after that.

Do we smash the data centers now to save the artists' livelihoods and thereby avoid a future where anyone that wants can talk a computer through creating entire interactive fictional worlds via the synthesis of AI generation with feedback from their imagination?

[1] referring here to https://en.wikipedia.org/wiki/Computer_(occupation)


>Do we smash the data centers now

Yes, the sooner, the better.


I wouldn't be so confident one way or another, this is too new. I think it's going to make a lot of things way more accessible and enable people to express their creative voice who couldn't before. On the other hand you're looking at the destruction of a lot of professions, and possibly overnight with the speed things are moving at. I think if we told every software engineer their skills were entirely obsolete and they had change career tomorrow the reception would be much colder.

I remember when I started working on generative models in 2015, you could barely generate a picture of a blurry 40x40 pixels face. Two years later 1024x1024 almost indistinguishable from reality. Now every week we have a new revolutionary application coming out.


I think that the argument, overall, is that there are questions as to the legality of certain applications of the technology.

Society needs to change the laws regarding the preservation of value of intellectual labor, as has long been suggested.

Acting like the law doesn’t matter is a bad thing, if we are making value judgements.


So if a particular convolution appears often, how would you assign value to that? Any ideas?


Indeed, see https://news.ycombinator.com/item?id=33194623 for some initial thoughts on how this problem (and others) could be rectified


So, an AI Karen?

>the generative AI system censors any output containing a category of error

I can see the ruleset now…

"This picture of a woman is revealing hair, so it must be censored because it is objectionable to some people and we must respect all people whose beliefs are guided in a sanctioned way."

"This picture shows unadulterated fun, which must be censored because…"

etc.

https://www.youtube.com/watch?v=IFe9wiDfb0E


Or more to the point, “this picture contains copyrighted material, which must be censored because…”

etc.

I tried to be as general as possible.

The training data for a self-censorship neural network could be as robust as any given society would like.

An algorithm based on self-censorship of generated output wouldn’t require censorship within the training data for the generative neural network.

I can imagine some other advantages to that approach.


But literally (and I use that word literally) none of the pictures contain copyrighted material.


I don't know how people can make these strong statements about anything in law.

Disney have won cases in court were some artist has drawn their own version of Mickey Mouse, similarly try writing a story about some kids in a wizard school and you need to be extremely careful not to violate (or at least get taken to court) for Harry Potters copyright.

I'm pretty certain image production models have produced some images which would very likely to be judged to violate copyright (a much less strong statement).


You are confusing copyright with trademark. Or, provide a link showing the images case was decided on copyright issues, and I’ll reconsider my position.


> The only difference is that now it's algorithms creating the derivatives, not people.

I see more of a difference than that.

I really don't care if a person makes a meme out of one of my Flickr photos I've shared publicly.

I'm much more grumpy at the idea that Facebook/Google/Microsoft using my Flickr photos and "giving away the AI automemes" as a way to further lock people into their walled gardens of surveillance capitalism.

(Not enough that I actually care enough to do anything about it. I have my Flickr account set to default to CC BY 2.0 for uploads, and I try reasonably hard to remember to lock that down to All Rights Reserved is I'm uploading pics of family or friends. But I don't lose sleep over any of it. I do sometime come across this pic of mine, which took on a life of it's own and is all over the internet, at least in coffee-related places, and wistfully wonder if I could have gotten more credit for it... https://flic.kr/p/sVHP9 )


this is larger than the arts. anybody has ever participated creatively in our culture understands that it's absolute bullshit to pretend we need money in order to want to contribute artistically.

we need money because food is for sale, because most of us do not own where we live hence we are forced (a priori) to come up with a whole lot of money every month or else you're out in the streets.


Learn to code! Oh wait...


Sure but unless you bring down capitalism people will still need to work to eat and most will want to use their hard-earned creative skills to make a living.

Not only that but being able to dedicate 8 to 10 hours a day to your craft for 40 years bring it to a level that you can't reach with casual practice.


> Sure but unless you bring down capitalism people will still need to work to eat and most will want to use their hard-earned creative skills to make a living.

The concept of a UBI (universal basic income) isn’t inherently in conflict with capitalism. I believe that it is actually in coherence with the idea of Universal Human Rights, as defined by the UN in the 1940s.

Perhaps that would be the culmination of anything good about capitalism.


The problem is that UBI is in conflict with arithmetics. Short of near-total redistribution, it's impossible to provide a decent level of UBI for everyone. Total redistribution doesn't work, because economy needs markers as ways of price / demand discovery, and markets apparently lead to power-law distribution, not flat.

IMHO, the realistic option is a thick enough safety net for those who is going through a rough spot, for the disabled, etc, via both taxes and charity. But the vast majority will have to work, in one way or another, until machines completely take over, like in the Culture books by Ian Banks.


I’m not an economist, but I would suggest that human ingenuity can find a way to make something along those lines work.

In the United States, for example, there are so many different welfare programs, might there not be a way to consolidate under a new set of rules?

Similarly, with regards to charity, has any economist modeled a hypothetical of transiting charity into voluntary taxes for welfare?


The sum of all welfare isn't half of what UBI would cost.


Perhaps, as the parent commenter suggested, we need a Universal Safety Net, then, instead of a strict UBI.


Between unemployment insurance and minimum wage, we already have something like UBI, just mismanaged and with a lot of overhead.

Full-fledged UBI that provides decent living would require highly progressive taxes with the top bracket being in the ballpark of 70%. We could deal that down quite a bit if we start taxing capital gains properly, but even without that, it's neither impossible nor unprecedented.


just make sure basic necessities (housing, education, medicine) stay out of the market economy.

the 'market economy' (capitalism) is good at some things, but terrible at others. we need to stop collectively using this social-technology (a kind of market super optimizer) in the wrong places.


Housing will always be a competitive market so long as location matters. Access to education and health care are themselves some of reasons why location matters, and a home in an urban core is priced higher than one in a far remote community.

Even in communist countries you find competition for housing as a result of the intrinsic value of location.


capitalism cannot be brought down. this one must fall on its own.

I consider France one of the best examples of capitalism https://www.express.co.uk/news/world/1683661/paris-protests-...


Do we allow artists to withhold their works from the minds of eager, learning children? [1]

Tell me how ML is different than the mind of a toddler ravenous for new information.

For every billion dollar start-up using data at scale, there are tens of thousands more researchers and hobbyists doing the exact same, producing wonderful results and advances.

If we stop this growth dead in the tracks, other countries more willing to look past the IP laws will jump ahead. And if Stability locks away their secret sauce, some new party will come and give away the keys to the kingdom yet again.

You can't block the signal. Except, of course, by legislating against it in some Luddite hope we can prevent the future from happening.

Instead of worrying careers will end, we should look at this as being the end of specialization. No longer do we need to pay 20,000 hours to learn one thing to the exclusion of all others we would like to try. Now we'll be able to clearly articulate ourselves with art, music, poetry. We'll become powerful beings of thought and expression.

Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.

[1] Maybe Disney would like you to pay more for a premium learning plan for your child, but thankfully that's not (yet) possible.


Most machine learning is assigning weights in a chain of matrix multiplications and normalization functions.

There is no known experimentally verifyiable model of toddlers' brains, let alone one based on matrix multiplication and normalization. Developing such a model would be a noteworthy achievement.

Therefore these are different.


Some Artificial Neural Networks have been shown to significantly (at least up to 50% concordance) model brain function.

Not the mention the laborious work of neuroscientists to build out the connectome of the human brain.


Two systems that produce the same output for some set of inputs doesn't show the systems are the same. My phone can produce the same results as my brain for short arithmetic problems. My phone is not a brain.

The neuroscientists I know in the field would be among the first to tell you that our ability to model the brain is nearly non existent. In fact we don't even have a great model of a single neuron [1]. This statement doesn't invalidate the work folks are doing to try and reach that goal. Biology is hard.

[1] https://en.m.wikipedia.org/wiki/Biological_neuron_model


As a working neuroscientist, I’ll co-sign this!

Understanding 50% of the brain, whatever that would even mean, is an utter fantasy.


I should have clarified that I was talking about the specific brain function of semantic comprehension.

I am not suggesting that we are anywhere near having a complete analytical model of 50% of the brain.

I am suggesting that we do have tools to continue answering questions about functional aspects of the brain.

Or am I missing something that indicates the non-utility of “function analysis” of biology-based artificial neural networks?



These articles use far more cautious language than you suggest and if they don't everyone working in the field is hopefully aware that such claims are the academic equivalent of clickbait at best.


>No longer do we need to pay 20,000 hours to learn one thing to the exclusion of all others we would like to try. Now we'll be able to clearly articulate ourselves with art, music, poetry. We'll become powerful beings of thought and expression.

I'm a 20000 hours person. Knowing what I know about what I do, it's real sad to see someone misunderstand what goes into creativity this egregiously. Prompt engineering is such an unbelievably watered down "version" of making a painting. It's like writing a page, or even a folder! of bullet points and handing it to a ghostwriter, then telling them "put the end result between Shakespeare and Poe".

That's not unleashing your creative voice. Unleashing your voice and acquiring technical skills in a chosen field are the same. If you endlessly mixed all the prior classical works, it doesn't matter how you weight them, it won't spit out Mozart. You're stuck in the gamut of the model, between the maxima points of each artist.

It's an incredible tool to generate stuff quickly, and to some extent it will help artists whose work depends on quantity over quality.


You can prompt with images, which let's you control colour and composition, and with masking you can iteratively work on sections to guide the image to what you are picturing. That can shift the creative part more towards the user.


Yes, I've seen the photoshop plugin. You're comparing playing with duplo blocks to marble sculpture.


> Tell me how ML is different than the mind of a toddler ravenous for new information.

If a person published a work that clearly plagiarized or violated a patent, that person would be open to legal action.

I’m all for systemic change, but uses like this may end up having a chilling effect on human-created work.


> I’m all for systemic change, but uses like this may end up having a chilling effect on human-created work.

Everytime this comes up, whichever party fears for it's livelihood always says something like this and ignores the other side: that rigorous enforcement activity is going to do the same thing, to human created work. Richard Stallman wrote a short story about this very issue.[1]

There are already people hurling abuse around on Twitter at artists because they think that something they made was produced with Stable Diffusion or something else.

[1] https://www.gnu.org/philosophy/right-to-read.en.html


> Everytime this comes up, whichever party fears for it's livelihood always says something like this and ignores the other side: that rigorous enforcement activity is going to do the same thing, to human created work.

I may be providing a counter-example to your argument.

At this time, I’m not advocating for anything other than self-censorship by generative AI systems (see https://news.ycombinator.com/item?id=33194623 for some initial thoughts) and, as aggregated from some of my other comments in this thread, the following:

I think that it will be important to ensure that we have symmetric information, going forward, otherwise trying to put the genie back in the bottle may just end up further disadvantaging those that try to follow the rules.

-

Society needs to change the laws regarding the preservation of value of intellectual labor, as has long been suggested.

Acting like the law doesn’t matter is a bad thing, if we are making value judgements.

-

If society doesn’t value commodity intellectual labor, then society may need to address the commoditization of intellectual labor, directly, through things like UBI / vocational rehabilitation, etc.

Similar arguments can be made about robots and the commoditization of manual labor.


Funny that you cite Stallman, when Copilot using GPLed code in closed-source projects is a real concern.


The criticism is that AI works are not transformative, but are recognizable “regurgitation” of training set.

It’s not that AIs are too good. They look like crude knockoff products to trained eyes. And crude knockoffs are usually considered bad things.


"Good artists borrow, great artists steal."

A lot of artists get started with tracing before taking off the training wheels. You also see new art styles quickly proliferate across the entire community, so clearly there's some unspoken copying happening.

These models are producing new works in nearly identical styles. That's something a trained human could conceivably do.


> A lot of artists get started with tracing before taking off the training wheels.

Sure, but only privately. Publishing something you traced is a massive no-no, and selling it even more so.


Picasso was a hack, and it's reflected in that quote of his.


Yeah, but when couple lines match up with existing arts, you go up in flames and you change the careers. NAI is doing the first half of that.


>Tell me how ML is different than the mind of a toddler ravenous for new information.

The toddler is human. AIs are not humans.

It's a human right to learn. Non-humans don't (and shouldn't) have human rights.

>Humans aren't the end or the peak of evolution. We should be excited to watch this unfold.

Spoken like a true evolutionary loser.


Well a toddler isn’t making money off the information they are absorbing for one. If these are open to the public models that is one thing. But no, these are proprietary models whose sole purpose is to make money for large corporations.


Artists and engineers do exactly this. It just takes a decade.


They are taking your code verbatim and injecting it into numerous code bases around the world violating the license while getting paid for it?


> Tell me how ML is different than the mind of a toddler ravenous for new information.

Well, I can't keep a toddler in a data center, pumping out work on demand. Or copyright it and limit who it chooses to work for when it grows up.

For instance.


This reminds me of the Jedi Mind trick of Uber of waving a smartphone to argue that labor & other laws all of a sudden don't apply to them, to the detriment of the public that'll now shoulder the costs.


Big Tech has really big datasets esp Google. With YouTube, Photos, Music, Gmail, Docs, Maps, Books, Waymo, Search … they have giant multimodal datasets that capture essence of all human knowledge. They have 10+ products with more a billion users creating data for them.

If Google Brain/DeepMind were to crack AGI, it would make Google/Alphabet crazy rich at the detriment of millions of YouTubers, Book authors, musicians, drivers.

AI will concentrate power and wealth to fewer individuals.


Ads companies getting rich off of AGI seems a bit sensational when they’re already getting rich off of the boring type of AI. They’ve already gotten rich indexing the web and all the data we have years ago.


I've got a couple examples of Stable Diffusion replicating watermarks along with similar swatches of imagery into scenes from the same prompt [1]. A single case of this should be enough to file a massive lawsuit if the art were recognizable to the creator.

[1] https://news.ycombinator.com/item?id=33061707


The model learns all attributes of the images it's trained on, including that some have a watermark. The fact that it generates a watermark in some images doesn't mean that that is a 1:1 image from the training set, it just means to the model some images seem to have a watermark, so it will add it sometimes. Often you can just add "no watermark" (or add it as a negative prompt with some weights) and re-use the same seed to get the same image without the watermark.


It may or may not be a 1:1 image, but I think it's significant that in both cases, with different seeds, what is directly behind / right of the watermark is a pretty similar building with different distortions applied to it. I'm not sure what the difference is between "learning" from a particular image and encoding that image with a lot of compression, when in either case the usage more or less reliably reconstructs the image algorithmically.

If I have a photographic memory and I memorize the Coca Cola logo and then draw it into a commercial work by decoding the firing of my neurons into muscle movements, the storage and retrieval method I used has no bearing on whether I infringed on their copyright.


No, it means that it is reproducing the original work and is not producing a new original work. It is basically a really fancy Instagram lense, but it is still 100% derived from the underlying works and therefor derivative instead of a newly created non-derivative work.


I'm not sure how you can make this argument just based on the model synthezing a watermark that is has learned about in the original dataset. Don't forget, the model is only 4GB in size, and while it's not out of the question that it could regurgitate an image from its data set, considering the size of the training which is a few magnitudes larger it is highly unlikely.


> It’s currently unclear if training deep learning models on copyrighted material is a form of infringement

What? It's clearly a derived work.


I'm pretty sure I can count the number of words in Harry Potter without breaking copyright law.

It is absolutely not clear when statistical models stops counting ngrams and starts making a derived works.


You can also read the HP series and write summaries and reviews about each book as wodenokoto. You can probably create HP looking artwork and write stories that could fit into the HP universe. You can't call any of your work HP. This sounds obvious, but if it's done by a machine then some people think it's a different question.

I can write code to get a list of characters in the book, get their page numbers analysed and draw graphs to help me create my own version. Am I breaking copyright laws? Most likely not.

It's a truly grey area which lawmakers never saw coming.

I believe if events unfold well we'll see and treat AI tools to be like sharp knives eventually. It will be up to the user what they do with it.


>You can also read the HP series and write summaries and reviews about each book as wodenokoto. You can probably create HP looking artwork and write stories that could fit into the HP universe

IIRC there have been lawsuits about exactly that. A person wrote (and published) some fandom in the Harry Potter universe (without Harry Potter in it IIRC), he lost the case I believe. This is similar to the fact that you cannot make your own comic books with Mickey Mouse (unless your operation is small enough that it flies under the radar), the universe/characters are in fact copyrighted.


Probably they used too much reference, I wasn't implying that the universe itself is not protected. But writing something similar that would appeal the fans should be okay.



Alright, then every piece of music you've ever heard is also a derived work. Unless the composer grew up in a void.


I think you're trying to be ridiculous, but lawsuits of that sort do come up periodically - accusations of composition theft that in practice are probably not theft and just the fact that there aren't THAT many unique note progressions you can put into a song, and it's not that odd for a composer to accidentally imitate something they heard a long time ago when picking one.


How does AI change the playing field here? Accidental infringement is a thing either way, and the creator should be careful to avoid accidental infringement regardless of whether they're using an AI to do the creation.


Was this term coined on HN? I remember first seeing it (used in an AI context) from this 2019 comment under "Cool stuff that's still completely unregulated": https://news.ycombinator.com/item?id=21167689

Most of the predictions in that first comment came true.


William Gibson mentions data laundering as an illicit activity in the Neuromancer books! It’s plausible that the phrase itself was coined there


> But then Meta is using those academic non-commercial datasets to train a model, presumably for future commercial use in their products. Weird, right?

This is a very strong and likely inaccurate presumption.


Is it? Maybe they have their own internal version they are using, but who's to say they aren't fine tuning the model and applying it somewhere?


yep. This class of fallacy has its own wiki article: https://en.wikipedia.org/wiki/Appeal_to_probability


The horse seems well out of the gate.


the horse is out of the gate on photocopiers and before them printing presses but that doesn't make using them without the rights to what you are copying legal


I would go as far to say a vast amount of our creative economic output would cease tomorrow if we had strict 'right to read' copyright enforcement. You can also thank Disney for extending copyright beyond all reason.


but we aren't talking about right to read. we are talking about right to read and then regurgitate and distribute/sell a derivative work of what you read with substantial similarities such that it harms the market for the original work.


The whole thing is a mess but frankly i doubt this genie can be put back in the bottle


I think it is an a priori fact that the cat is out of the bag.

The existing publicly available datasettes, algorithms, and weighted models certainly should be expected to be permanently in the hands of some non-law-abiding parties, at this point.

I think that it will be important to ensure that we have symmetric information, going forward, otherwise trying to put the genie back in the bottle may just end up further disadvantaging those that try to follow the rules.


...said the music industry about samplers in the 1990s.


The Flickr example is wild. How was nobody sued for that!?


Are we heading towards voiding most of current copyrights or is there a way out of this mess with another patch to the laws?


It’s definitely fair use. One question I have though is Mickey Mouse protected by copyright or trademark or ? I assume someone other than Disney can’t sell mickeys likeness or is that wrong in art? And if the AI makes a movie?


Not sure laundering it the right term.

Laundering private things through the commons feels not as shady as laundering in private networks. The commons benefits too.

It's more like open source that money laundering




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: