I have to wonder how much releasing these models will "poison the well" and fill...

rhacker · on May 24, 2022

Look at carpentry blogs, recipe blogs. Nearly all of it is junk content. I bet if you combined GPT and imagen or dalle2 you could replace all of them. Just provide a betty crocker recipe and let it generate a blog that has weekly updates and even a bunch of images - "happy family enjoying pancakes together"

I can see the future as being devoid of any humanity.

emidoots · on May 24, 2022

I wrote a comedic "Best Apache Chef recipe" article[1] mocking these sites.

I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.

[0] http://slimsag.com/best-apache-chef-recipe/1438731.htm

rhacker · on May 24, 2022

I don't know. If we have something like imagen or dalle, I can imagine something that can produce "tasty" food from random ingredients isn't far off.

futureshock · on May 24, 2022

Then we will still need humans in the loop to do the cherry picking/supervised learning. Gibberish recipes need to be flagged and interesting new creations need to be promoted. The input can be fed back into the model till the model contains accurate representations of the chemical reactions of cooking ingredients and the neuronal wiring of the human olfactory system.

ithkuil · on May 24, 2022

> Allow server to cool down for ~10 minutes

Epic

thelittleone · on May 24, 2022

Seeing this a lot on youtube also. Scripts pulling in "news" from a source as a script for a robo voice combined with "related" images stitched together randomly.

hn_throwaway_99 · on May 24, 2022

Even though it's not AI, this is already happening with a lot of content farms. There was a good video a couple years ago from Ann Reason of "How to Cook That" that basically pointed out how the visually-appealing-but-not-actually-feasible "hands and pans" content farms (So Tasty, 5 Minute Crafts, etc.) were killing genuine baking channels.

Imagine that instead of having cheap labor from Southeast Asia churn out these videos, that instead they are just spit out as fast as possible using AI.

GLGirty · on May 24, 2022

> Ann Reason

"Anne Reardon" (autocorrect wah wah waaah)

exikyut · on May 24, 2022

"Picture of happy nuclear family enjoying paperclip maximization at the beach"

animal_spirits · on May 24, 2022

The future digital landscape might be void of humanity, but there will still be real humans living next door to you ;)

rhacker · on May 24, 2022

The only interaction with people now is installing bark detector automatic dog whistles for our neighbors dogs and ring doorbells.

rmbyrro · on May 24, 2022

I see the opposite future.

As AI advances, a lot of people will look after experiencing life outside the digital world.

Even digital communication will not be trustworthy anymore with deepfaces and everything else, so people will want to get together more often.

Edit: for the lazy ones, yeah, digital will be a sad and heartless environment...

natly · on May 24, 2022

This is my theory as well. There'll be a short period where some of us at the forefront will enrich themselves by flooding the internet with imagery never seen before. That'll be a bubble where people think "abundance" has been solved but then it'll pop as people start to not trust anything they see online anymore and as you say, only trust and interact with things in the real world (wouldn't surpise me if regulation got involved here too somehow).

4m1rk · on May 24, 2022

Doesn't it increases the value of genuine human-produced content? Or their NFTs!

natly · on May 24, 2022

For the very skilled yes. But a lot of low skilled artists of content creators will have the rugs pulled out from under them. (And how will we ever get high skilled artists trained in the future if they can't make a living from their lower tier output before they reach mastery.)

kimi · on May 25, 2022

> I can see the future as being devoid of any humanity.

Considering how many of the readers of said blog will be scrapers and bots, who will use the results to generate more spammy "content", I think you are right.

joshspankit · on May 24, 2022

I’d much rather skip the blog format and replace them with an AI that can answer “Please provide a pie recipe like my grandparent’s”, or “I’d like to make these ribs on the BBQ so that they come out flavourful, soft, and a little sweet.”

actionfromafar · on May 24, 2022

- 100mg of Zoloft

wash it down with water.

walt74 · on May 25, 2022

>I can see the future as being devoid of any humanity.

I can see a past where this already happened, to paraphrase Douglas Adams ;)

rg111 · on May 24, 2022

People training newer models just have to look for the "Imagen" tag or the Dall-E2 rainbow at the corner and heuristically exclude images having these. This is trivial.

Unless you assume there are bad actors who will crop out the tags. Not many people now have access to Dall-E2 or will have access to Imagen.

As someone working in Vision, I am also thinking about whether to include such images deliberately. Using image augmentation techniques is ubiquitous in the field. Thus we introduce many examples for training the model that are not in the distribution over input images. They improve model generality by huge margins. Whether generated images improve generality of future models is a thing to try.

Damn I just got an idea for a paper writing this comment.

SirHound · on May 24, 2022

Most images you see from these services will not have a watermark on them. Cropping is trivial.

ithkuil · on May 24, 2022

Perhaps a watermark should be embedded in a subtle way across the whole image. What is the word? "Steganography" is designed to solve a different problem and I don't think it survives recompression etc. Is there a way to create weakly secure watermarks that are invisible to naked eye, spread across the whole image, resistant to scaling and lossless compression (to a point)?

wongarsu · on May 24, 2022

Invisible, robust watermarks had a lot of attention in research from the late 90s to the early 10s, and apparently some resurgence with the availability of cheap GPU power.

Naturally there's a python library [1] with some algorithms that are resistant to lossy compression, cropping, brightness changes, etc. Scaling seems to be a weakness though.

1: https://pypi.org/project/invisible-watermark/

wnkrshm · on May 24, 2022

It's ironic, seeing people who build models trained on other people's work (which is in no way credited) to be worried about origin and credit.

viraptor · on May 24, 2022

> Unless you assume there are bad actors who will crop out the tags.

I don't know why people do that but lots of randoms on the internet do that and they're not even bad actors per se. The removed signatures from art posted online became a kind of a meme itself. Especially when comic strips are reposted on Reddit. So yeah, we'll see lots of them.

zone411 · on May 24, 2022

In my melody generation system I'm already including melodies that I've judged as "good" (https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y...) in the updated training set. Since the number of catchy melodies that have been created by humans is much, much lower than the number of pretty images, it makes a significant difference. But I'd expect that including AI-generated images without human quality judgement scores in the training set won't be any better than other augmentation techniques.

JayStavis · on May 24, 2022

Huh, I had never thought of that. Makes it seem like there's a small window of authenticity closing.

The irony is that if you had a great discriminator to separate the wheat from the chaff, that it would probably make its way into the next model and would no longer be useful.

My only recommendation is that OpenAI et al should be tagging metadata for all generated images as synthetic. That would be a really interesting tag for media file formats (would be much better native than metadata though) and probably useful across a lot of domains.

joshspankit · on May 24, 2022

The OpenAI access agreement actually says that you must add (or keep?) a watermark on any generated images, so you’re in good company with that line of thinking.

agar · on May 24, 2022

The irony is that when the majority of content becomes computer-generated, most of that content will also be computer-consumed.

Neil Stephenson covered this briefly in "Fall; or Dodge In Hell." So much 'net content was garbage, AI-generated, and/or spam that it could only be consumed via "editors" (either AI or AI+human, depending on your income level) that separated the interesting sliver of content from...everything else.

jillesvangurp · on May 24, 2022

He was definitely onto something in that book where people also resort to using blockchains to fingerprint their behavior and build an unbreakable chain of authenticity. Later in that book that is used to authorize the hardware access of the deceased and uploaded individuals.

A bit far out there in terms of plot but the notion of authenticating based on a multitude of factors and fingerprints is not that strange. We've already started doing that. It's just that we currently still consume a lot of unsigned content from all sorts of unreliable/untrustworthy sources.

Fake news stops being a thing as soon as you stop doing that. Having people sign off on and vouch for content needs to start becoming a thing. I might see Joe Biden saying stuff in a video on Youtube. But how do I know if that's real or not?

With deep fakes already happening, that's no longer an academic question. The answer is that you can't know. Unless people sign the content. Like Joe Biden, any journalists involved, etc. You might still not know 100% it is real but you can know whether relevant people signed off on it or not and then simply ignore any unsigned content from non reputable sources. Reputations are something we can track using signatures, blockchains, and other solutions.

Interesting with Neal Stephenson that he presents a problem and a possible solution in that book.

agar · on May 25, 2022

As usual, Stephenson is at his best when he's taking current trends and extrapolating them to almost absurd extremes...until about a decade passes and you realize they weren't that extreme after all.

I loved that he extended the concept of identity as an individualized pattern of events and activities to the real world: the innovation of face masks with seemingly random but unique patterns to foil facial recognition systems but still create a unique identity.

Like you say, the story itself had horrible flaws (I'm still not sure if I liked it in its totality, and I'm a Stephenson fan since reading Snow Crash on release in '92), but still had fascinating and thought provoking content.

dirkc · on May 24, 2022

> blockchains to fingerprint their behavior and build an unbreakable chain of authenticity. Later in that book that is used to authorize the hardware access of the deceased and uploaded individuals.

maybe I misunderstood, but I had it that people used generative AI models that would transform the media they produced. The generated content can be uniquely identified, but the creator (or creators) retains anonymity. Later these generative AI models morphed into a form of identity since they could be accurately and uniquely identified.

jillesvangurp · on May 24, 2022

All part of the mix. But definitely some blockchain thing underneath to tie it all together. Stephenson was writing about crypto currencies as early as the nineties. Around the time he also coined the term Metaverse.

afro88 · on May 24, 2022

I can see a world where in person consumption of creative media (art, music, movies etc), where all devices are to be left at the door, becomes more and more sought after and lucrative.

If the AI models can't consume it, it can't be commoditised and, well, ruined.

whatshisface · on May 24, 2022

I don't think it will "poison the well" so much as change it - images that humans like more will get a higher pagerank, so the models trained on Google Images will not so much as degrade as they will detach from reality and begin to follow the human mind they way plausible fiction does.

joshspankit · on May 24, 2022

Just yesterday I was speculating that current AI is bad at math because math on the internet is spectacularly terrible.

I think you’re right, and it’s unlikely that we (society) will convince people to label their AI content as such so that scraping is still feasible.

It’s far more likely that companies will be formed to provide “pristine training sets of human-created content”, and quite likely they will be subscription based.

trhway · on May 24, 2022

>“pristine training sets of human-created content”

well, we do have organic/farmed/handcrafted/etc. food. One can imagine information nutrition label - "contains 70% AI generated content, triggers 25% of the daily dopamine release target".

joshspankit · on May 24, 2022

Right? And if you’re a VC backing a new startup you might want to pay that extra to get them started right.

kleer001 · on May 24, 2022

How would that really happen? It seems to me you're assuming that there's no such thing as extant databases of actual oil paintings, that people will stop producing, documenting, and curating said paintings. I think the internet and curated image databases are far more well kept than your proposed model accounts for.

qz_kb · on May 24, 2022

My hypothetical example is not really about oil paintings, but the fact these models will surely get deployed and used for stock photos for articles, on art pages etc.

I think this will introduce unavoidable background noise that will be super hard to fully eliminate in future large scale data sets scraped from the web, there's always going to be more and more photorealistic pictures of "cats" "chairs" etc. in the data that are close to looking real but not quite, and we can never really go back to a world where there's only "real" pictures, or "authentic human art" on the internet.

WalterBright · on May 24, 2022

My first thought on reading the article is generating images for my presentations.

abel_ · on May 24, 2022

On the contrary -- the opposite will happen. There's a decent body of research showing that just by training foundation models on their outputs, you amplify their capabilities.

Less common opinion: this is also how you end up with models that understand the concept of themselves, which has high economic value.

Even less common opinion: that's really dangerous.

rajnathani · on May 25, 2022

For better training data in the future: Storing a content hash and author identification (an example proprietary solution right now [0]) of image authors, and having a decentralized reputation system for people/authors would help be the solution for better training data in the future whereby authors can gain reputation/incentives too.

[0] https://creativecloud.adobe.com/discover/article/how-to-use-...

gwern · on May 30, 2022

I don't think it will be a big deal, for multiple different reasons: https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-...

dclowd9901 · on May 24, 2022

Eventually the only jobs humans will have is training AI to act human. Sounds very Philip K Dick now that I think about it.

actionfromafar · on May 24, 2022

The transition will be complete when some AI can fool/bribe the other AIs that its workers are human.

kulikalov · on May 25, 2022

Good looking images will be popular, bad looking images will be disposed on the backyard of the internet. Even if next iterations of these models will be trained on AI-generated images, the dataset will be well filtered by how much people like those images. After all, that's the purpose of any art, right?

LoveMortuus · on May 25, 2022

Maybe we'll go back to index based search engines like Yahoo. Could resolve many issues we see today, but I think the biggest question is scalability. Maybe some open source open database system?

benlivengood · on May 24, 2022

I think instead the images people want to put on the Internet will do the same for these models as adversarial training did for AlphaZero; it will learn what kinds of images engage human reaction.

VMG · on May 24, 2022

It will not be limited to the internet. Have you looked at a magazine stand in the last 10 years? The content looks generated (not by AI) even today.

Cheap books, cheap TV and cheap music will be generated.

bowmessage · on May 24, 2022

I also worry about the potential to further stifle human creativity, e.g. why paint that oil painting of a panda riding a bicycle when I could generate one in seconds?

telesilla · on May 24, 2022

Our imaginations are gigantic. We'll find something else impressive and engaging to do. Or not care. I'm not worried. Watch children: they find a way to play even when there is nothing.

richrichardsson · on May 24, 2022

One reason:

A digital picture of an oil painting != an actual oil painting

Of course once someone trains an AI with a robotic arm to do the actual painting, then your worry holds firm.

webmaven · on May 25, 2022

> Of course once someone trains an AI with a robotic arm to do the actual painting, then your worry holds firm.

It's been done, starting from plotter based solutions years ago, through the work of folks like Thomas Lindemeier:

https://scholar.google.com/citations?user=5PpKJ7QAAAAJ&hl=en...

Up to and including actual painting robot arms that dip brushes in paint and apply strokes to canvas today:

https://www.theguardian.com/technology/2022/apr/04/mind-blow...

The painting technique isn't all that great yet for any of these artbots working in a physical medium, but that's largely a general lack of dexterity in manual tool use rather than an art specific challenge. I suspect that RL environments that physically model the application of paint with a brush would help advance the SOTA. It might be cheaper to model other mediums like pencil, charcoal, or even airbrushing first, before tackling more complex and dimensional mediums like oil paint or watercolor.

avn2109 · on May 24, 2022

Surely this already exists right?

Gigachad · on May 24, 2022

I wonder if google images could just seed in some generated images when none relevant are found..

ismepornnahi · on May 24, 2022

Adding a watermark to all AI generated images should be imperative.