I have to wonder how much releasing these models will "poison the well" and fill the internet with AI generated images that make training an improved model difficult. After all if every 9/10 "oil painted" image online starts being from these generative models it'll become increasingly difficult to scrape the web and to learn from real world data in a variety of domains. Essentially once these things are widely available the internet will become harder to scrape for good data and models will start training on their own output. The internet will also probably get worse for humans since search results will be completely polluted with these "sort of realistic" images which can ultimately be spit out at breakneck speed by smashing words from a dictionary together...
Look at carpentry blogs, recipe blogs. Nearly all of it is junk content. I bet if you combined GPT and imagen or dalle2 you could replace all of them. Just provide a betty crocker recipe and let it generate a blog that has weekly updates and even a bunch of images - "happy family enjoying pancakes together"
I can see the future as being devoid of any humanity.
I wrote a comedic "Best Apache Chef recipe" article[1] mocking these sites.
I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.
Then we will still need humans in the loop to do the cherry picking/supervised learning. Gibberish recipes need to be flagged and interesting new creations need to be promoted. The input can be fed back into the model till the model contains accurate representations of the chemical reactions of cooking ingredients and the neuronal wiring of the human olfactory system.
Seeing this a lot on youtube also. Scripts pulling in "news" from a source as a script for a robo voice combined with "related" images stitched together randomly.
Even though it's not AI, this is already happening with a lot of content farms. There was a good video a couple years ago from Ann Reason of "How to Cook That" that basically pointed out how the visually-appealing-but-not-actually-feasible "hands and pans" content farms (So Tasty, 5 Minute Crafts, etc.) were killing genuine baking channels.
Imagine that instead of having cheap labor from Southeast Asia churn out these videos, that instead they are just spit out as fast as possible using AI.
This is my theory as well. There'll be a short period where some of us at the forefront will enrich themselves by flooding the internet with imagery never seen before. That'll be a bubble where people think "abundance" has been solved but then it'll pop as people start to not trust anything they see online anymore and as you say, only trust and interact with things in the real world (wouldn't surpise me if regulation got involved here too somehow).
For the very skilled yes. But a lot of low skilled artists of content creators will have the rugs pulled out from under them. (And how will we ever get high skilled artists trained in the future if they can't make a living from their lower tier output before they reach mastery.)
> I can see the future as being devoid of any humanity.
Considering how many of the readers of said blog will be scrapers and bots, who will use the results to generate more spammy "content", I think you are right.
I’d much rather skip the blog format and replace them with an AI that can answer “Please provide a pie recipe like my grandparent’s”, or “I’d like to make these ribs on the BBQ so that they come out flavourful, soft, and a little sweet.”
People training newer models just have to look for the "Imagen" tag or the Dall-E2 rainbow at the corner and heuristically exclude images having these. This is trivial.
Unless you assume there are bad actors who will crop out the tags. Not many people now have access to Dall-E2 or will have access to Imagen.
As someone working in Vision, I am also thinking about whether to include such images deliberately. Using image augmentation techniques is ubiquitous in the field. Thus we introduce many examples for training the model that are not in the distribution over input images. They improve model generality by huge margins. Whether generated images improve generality of future models is a thing to try.
Damn I just got an idea for a paper writing this comment.
Perhaps a watermark should be embedded in a subtle way across the whole image. What is the word? "Steganography" is designed to solve a different problem and I don't think it survives recompression etc. Is there a way to create weakly secure watermarks that are invisible to naked eye, spread across the whole image, resistant to scaling and lossless compression (to a point)?
Invisible, robust watermarks had a lot of attention in research from the late 90s to the early 10s, and apparently some resurgence with the availability of cheap GPU power.
Naturally there's a python library [1] with some algorithms that are resistant to lossy compression, cropping, brightness changes, etc. Scaling seems to be a weakness though.
> Unless you assume there are bad actors who will crop out the tags.
I don't know why people do that but lots of randoms on the internet do that and they're not even bad actors per se. The removed signatures from art posted online became a kind of a meme itself. Especially when comic strips are reposted on Reddit. So yeah, we'll see lots of them.
In my melody generation system I'm already including melodies that I've judged as "good" (https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y...) in the updated training set. Since the number of catchy melodies that have been created by humans is much, much lower than the number of pretty images, it makes a significant difference. But I'd expect that including AI-generated images without human quality judgement scores in the training set won't be any better than other augmentation techniques.
Huh, I had never thought of that. Makes it seem like there's a small window of authenticity closing.
The irony is that if you had a great discriminator to separate the wheat from the chaff, that it would probably make its way into the next model and would no longer be useful.
My only recommendation is that OpenAI et al should be tagging metadata for all generated images as synthetic. That would be a really interesting tag for media file formats (would be much better native than metadata though) and probably useful across a lot of domains.
The OpenAI access agreement actually says that you must add (or keep?) a watermark on any generated images, so you’re in good company with that line of thinking.
The irony is that when the majority of content becomes computer-generated, most of that content will also be computer-consumed.
Neil Stephenson covered this briefly in "Fall; or Dodge In Hell." So much 'net content was garbage, AI-generated, and/or spam that it could only be consumed via "editors" (either AI or AI+human, depending on your income level) that separated the interesting sliver of content from...everything else.
He was definitely onto something in that book where people also resort to using blockchains to fingerprint their behavior and build an unbreakable chain of authenticity. Later in that book that is used to authorize the hardware access of the deceased and uploaded individuals.
A bit far out there in terms of plot but the notion of authenticating based on a multitude of factors and fingerprints is not that strange. We've already started doing that. It's just that we currently still consume a lot of unsigned content from all sorts of unreliable/untrustworthy sources.
Fake news stops being a thing as soon as you stop doing that. Having people sign off on and vouch for content needs to start becoming a thing. I might see Joe Biden saying stuff in a video on Youtube. But how do I know if that's real or not?
With deep fakes already happening, that's no longer an academic question. The answer is that you can't know. Unless people sign the content. Like Joe Biden, any journalists involved, etc. You might still not know 100% it is real but you can know whether relevant people signed off on it or not and then simply ignore any unsigned content from non reputable sources. Reputations are something we can track using signatures, blockchains, and other solutions.
Interesting with Neal Stephenson that he presents a problem and a possible solution in that book.
As usual, Stephenson is at his best when he's taking current trends and extrapolating them to almost absurd extremes...until about a decade passes and you realize they weren't that extreme after all.
I loved that he extended the concept of identity as an individualized pattern of events and activities to the real world: the innovation of face masks with seemingly random but unique patterns to foil facial recognition systems but still create a unique identity.
Like you say, the story itself had horrible flaws (I'm still not sure if I liked it in its totality, and I'm a Stephenson fan since reading Snow Crash on release in '92), but still had fascinating and thought provoking content.
> blockchains to fingerprint their behavior and build an unbreakable chain of authenticity. Later in that book that is used to authorize the hardware access of the deceased and uploaded individuals.
maybe I misunderstood, but I had it that people used generative AI models that would transform the media they produced. The generated content can be uniquely identified, but the creator (or creators) retains anonymity. Later these generative AI models morphed into a form of identity since they could be accurately and uniquely identified.
All part of the mix. But definitely some blockchain thing underneath to tie it all together. Stephenson was writing about crypto currencies as early as the nineties. Around the time he also coined the term Metaverse.
I can see a world where in person consumption of creative media (art, music, movies etc), where all devices are to be left at the door, becomes more and more sought after and lucrative.
If the AI models can't consume it, it can't be commoditised and, well, ruined.
I don't think it will "poison the well" so much as change it - images that humans like more will get a higher pagerank, so the models trained on Google Images will not so much as degrade as they will detach from reality and begin to follow the human mind they way plausible fiction does.
Just yesterday I was speculating that current AI is bad at math because math on the internet is spectacularly terrible.
I think you’re right, and it’s unlikely that we (society) will convince people to label their AI content as such so that scraping is still feasible.
It’s far more likely that companies will be formed to provide “pristine training sets of human-created content”, and quite likely they will be subscription based.
>“pristine training sets of human-created content”
well, we do have organic/farmed/handcrafted/etc. food. One can imagine information nutrition label - "contains 70% AI generated content, triggers 25% of the daily dopamine release target".
How would that really happen? It seems to me you're assuming that there's no such thing as extant databases of actual oil paintings, that people will stop producing, documenting, and curating said paintings. I think the internet and curated image databases are far more well kept than your proposed model accounts for.
My hypothetical example is not really about oil paintings, but the fact these models will surely get deployed and used for stock photos for articles, on art pages etc.
I think this will introduce unavoidable background noise that will be super hard to fully eliminate in future large scale data sets scraped from the web, there's always going to be more and more photorealistic pictures of "cats" "chairs" etc. in the data that are close to looking real but not quite, and we can never really go back to a world where there's only "real" pictures, or "authentic human art" on the internet.
On the contrary -- the opposite will happen. There's a decent body of research showing that just by training foundation models on their outputs, you amplify their capabilities.
Less common opinion: this is also how you end up with models that understand the concept of themselves, which has high economic value.
Even less common opinion: that's really dangerous.
For better training data in the future: Storing a content hash and author identification (an example proprietary solution right now [0]) of image authors, and having a decentralized reputation system for people/authors would help be the solution for better training data in the future whereby authors can gain reputation/incentives too.
Good looking images will be popular, bad looking images will be disposed on the backyard of the internet. Even if next iterations of these models will be trained on AI-generated images, the dataset will be well filtered by how much people like those images. After all, that's the purpose of any art, right?
Maybe we'll go back to index based search engines like Yahoo.
Could resolve many issues we see today, but I think the biggest question is scalability. Maybe some open source open database system?
I think instead the images people want to put on the Internet will do the same for these models as adversarial training did for AlphaZero; it will learn what kinds of images engage human reaction.
I also worry about the potential to further stifle human creativity, e.g. why paint that oil painting of a panda riding a bicycle when I could generate one in seconds?
Our imaginations are gigantic. We'll find something else impressive and engaging to do. Or not care. I'm not worried. Watch children: they find a way to play even when there is nothing.
The painting technique isn't all that great yet for any of these artbots working in a physical medium, but that's largely a general lack of dexterity in manual tool use rather than an art specific challenge. I suspect that RL environments that physically model the application of paint with a brush would help advance the SOTA. It might be cheaper to model other mediums like pencil, charcoal, or even airbrushing first, before tackling more complex and dimensional mediums like oil paint or watercolor.