Hacker News new | past | comments | ask | show | jobs | submit login
TheseToonsDoNotExist: StyleGAN2-ADA trained on CG cartoon faces (thesetoonsdonotexist.com)
131 points by codetrotter on Sept 25, 2021 | hide | past | favorite | 71 comments



This much more clearly displays the problem with all these “this service does not exist” because even though it doesn’t show the source material it’s abundantly clear that these are just simple stitches of the training toons.

Like you get Elsa with different hair or the up grandpa in a suit.

Once you compare them to the closest examples from the training date it becomes a lot less impressive than the implied “this face is completely out of the imagination of a AI model” turns out the model just imagined someone in the training set with the hair of someone else. Quite boring.


> it’s abundantly clear that these are just simple stitches of the training toons.

That's not how GANs work in general. With a limited training set the end result may appear as if they were stitched together samples, but that's not what happens under the hood.

Interpolation videos make it obvious that it encodes visual concepts, can freely manipulate them and even crank up parameters beyond anything found in the training set, thus giving exaggerated results.


Agreed, but I also agree with OP - as a Disney fan I can pretty easily identify some of the character parts this AI has taken them from.

One of them on mine is just Helen from the incredibles head with Elsas face. I couldn’t design a cartoon character from scratch, but I could definitely make a ‘new’ cartoon character if I’m allowed to take Homer Simpson’s head and paste Goofys face on top.


i think the results shown in this paper contradict your assertion:

https://openaccess.thecvf.com/content_ICCV_2019/papers/Abdal...

given an arbitrary face, we can find its embedding in the latent space of the model. this shows that the model has the potential to generalise to real but unseen examples?

on the other hand, i suspect you might be observing a bias in the structuring of the latent space.

thispersondoesnotexist.com likely samples the latent space with a gaussian or uniform distribution, and while the latent space may contain the full spectrum of possibilities, the density of semantically meaningful embeddings may be structured around the distribution of the training set rather than a uniform or gaussian.

i'm stretching my understanding of the topic in trying to convey this.


As others have said, that's not the way a GAN should work. Regurgitating the training set is basically a failure mode that is actively avoided when the models are build and trained.

Looking at these images, and not familiar with how the underlying CG training set is made, I wonder if the original series itself has some comparatively small set of latent features - dimensions you could adjust when drawing the faces - that the model is just learning, so that newly generated faces are effectively the same thing as if one had changed whatever setting you tweak when working with the underlying tool.


I see what you mean but this is definitely not universal about stylegan and it depends on factors such as size of the training set (I'm guessing it was smaller here) and training parameters.


Honestly this seems to be common for GANs in general. Though I don't think most people have looked through CelebA. But if you are lazier, you can scroll through thispersondoesnotexist and you'll find essentially celebrities with similar characteristics to what the OP is saying. More so, you actually see better quality images the closer to a celebrity they look (you see the same thing in the tune version here). I do think ADA is typically worse than the typical StyleGAN2, but that's the tradeoff you get with a smaller sample size (worse because people are training it on smaller datasets so more memorization).


I believe thispersondoesnotexist is also trained on FFHQ not just CelebA though.


How do you know this is the case for all of them?


A good indicator is that when I clicked on the "more..." button, I instantly recognised copied featured (eyes, face shape, nose, hair) from The Incredibles and Frozen in 3 out of the 4 samples just mashed together.

This shouldn't be the case unless you start actively looking for it.

It's just much easier to recognise with these cartoon characters than with realistic faces as there's naturally much less variety in the training material. Also the features are simplified to a point of being easily recognisable as well.


The other way to achieve similar result is to load a face from ThisPersonDoesNotExist.com [1] and then pipe it through Toonify.Photos [2]. Give it a label from ThisWordDoesNotExist.com [3] and here you go – you've got a character :)

Edit: wire it up with Stripe and sell characters to Pixar? Ha!

[1] https://news.ycombinator.com/item?id=19144280

[2] https://news.ycombinator.com/item?id=24494377

[3] https://news.ycombinator.com/item?id=23169962


This is interesting -- I can almost recognize the various Pixar characters these are influenced by. For instance, one has the distinct jaw of the old man from "up".


I always did wonder how much StyleGAN was compositing existing features rather than generating wholly distinct features.

With real human faces it's almost impossible to tell but with these you can definitely pick a character per feature.


Begs the question at what point does ownership change. Can I trace batman but change his hair color? what about just his face? or just his mouth? Can I cut and paste a bunch of superheroes together and claim it as my own? etc


This will certainly be the crux of intellectual property lawsuits in the future.

It's striking that some of these examples have distinct features from specific, identifiable datasets: we can occasionally recognize specific characters (the old man from Pixar's UP is getting mentioned a lot), but it also reproduces more general aesthetic patterns. Even when I can't recognize the source data, I can distinctly see in some of these faces "the Pixar look", and in others "the DreamWorks look".

Were I an IP lawyer, I would start thinking of arguments along the lines of "this technology simply obfuscates the source of plagiarisms". I would also start to think about trying to force anyone who uses this technology to disclose the sources of their training data, since a model trained largely on "the Pixar look" could be benefiting from Pixar's character design processes without having to hire any of Pixar's artists.

And, if I were philosophically inclined, I would also start thinking about how this is any different from hiring a random artist and instructing them to "design characters that look like Pixar characters".

I suspect that one key difference is that the human artist's success can't easily be measured, but the GAN's success can very easily be measured.


At some point big producers like Pixar will probably use something like StyleGAN to extend their copyright coverage. E.g. generate as many variations as they can, which then all fall under their own copyright.

So in the end this technology might not be as "liberating" as people think it is.


Not if a tech firm gets there first. Imagine the productive and legal power being in the algorithm.

In a way, this is a much better setup for artists and creatives. There isn't some giant licensing firm controlling your work. You simply buy or rent the best tools to make your work.

That said, it'll only be good for creatives and consumers if there is sufficient competition. And open source equivalents that still enable creation.


Wouldn't it make more sense to use a density based model and then describe some hull centered around your original creation?


I guess that's the big debate around GitHub Copilot and open source code, especially GPL (but also others).


The rights surrounding caricatures may provide some insight here. I know some celebrities are particularly vigilant about keeping unapproved photos of their faces out of circulation but they probably wouldn't have the same success with a hand-drawn likeness or caricature.


I think that would depend on the license of each image on the dataset that influenced the output image. maybe not but it would make sense to be this way


Way too boring, and has, like, 5 reference faces. I expected a crazy mix of all possible cartoon styles similar to thisanimedoesnotexist.


TADNE is trained on n ~2.8m source images (augmented to ~4m), covering k ~ tens of thousands of different characters. OP is trained on... I'm not sure because the site is devoid of any information, but I would guess it's closer to 10k images from a few hundred characters at most. So the diversity will be drastically reduced, although the use of -ADA should mean that it doesn't overfit as catastrophically as one would expect from previous GANs to such small n/k.


What would be really interesting, and possibly impossible to do, would be to get all the character designs that were rejected during the development process at the various studios. As has been pointed out in other comments, the GAN is generating some pretty boring variations of faces from source material that has been focus grouped and designed by committee to death before being released to the public.

I have seen some really cool and wacky designs in the hallways of DreamWorks, Blue Sky, Pixar, etc. during my time in the industry. I would love to get all of those designs into a training set as well.


StyleGAN interpolations of latent space are mindblowing but they are fundamentally superficial, which you can at times tell and more often get bored of. Instead, I would like to see a transformer network trained on ontogenetic/phylogenetic development/evolution material which could then generate new creatures. The representations could be abstract, identified by key topological properties for instance, which could then be used to do whatever artistic renderings with. Of course, in the end, perhaps the truest abstract representation would be genes and proteins.


Unfortunately mainstream ML research is obsessed with end-to-end solutions, so a lot of interesting ideas fall by the wayside.

A rule-based system combined with a Transformer and CV-based postprocessing to filter the most plausible and interesting results would be awesome.


lol. good idea but genes only get you so far. environment is where you get those proteins so that means to stimulate the world. ;)

just to get lots of retarded-baby monkey-fish-frogs.


Most of the toons I got generally look like normal characters but with a mild genetic defect: a lazy eye, potentially concerning lump on a cheek, etc.

Edit: actually, yeah, after looking at more examples nearly every one has some amount of cross-eye/focus disorder where both eyes aren't pointing in the same direction.


Like with thispersondoesnotexist, the toons also tend to have severe problems with the ears. Nearly every one of them has an ear that blends into the background or is disproportionately unmatched.


“Do all current movie ‘toons’ have one eyebrow up?”

I was suddenly struck by this question, and think there might be something to it. Clearly it was a standard feature of the training set.


It seems to be generating a lot of characters with "DreamWorks Face". https://tvtropes.org/pmwiki/pmwiki.php/Main/DreamworksFace


Solid reference. Thank you


Yeah, the ‘DreamWorks face’ is strong with this one: https://filmschoolrejects.com/wp-content/uploads/2017/03/dre...

Though exhibited more as some kind of a confused smirk. Which is of course also ubiquitous in cartoons for some reason.


Really interesting. Speaking as a total noob, if somebody wanted to make thispersondoesnotexist type of program, e.g. this site:

1) how much approx cost will it be to rent the servers to train such models?

2) Can this be done on a home computer running a $1k nvidia card in a reasonable time?

3) Can i use free tools like google colab(?) for this purpose?

I've always been interested in learning more about this field but haven't really bothered because I feel it would cost an arm and leg just to experiment. Can somebody please shed some light on this?


1) That depends entirely on the model in question (size, complexity) and the amount of training material.

Using a standard dataset like CelebA [1] and an "HQ" model (512x512) like StyleGAN2, training requires at least 1 GPU with 12GiB of VRAM and training of about a week with a single V100 GPU.

Depending on your provider of choice, this will cost anywhere from ~$514 (AWS), ~$420 (Google) to $210 (Lambda Labs, RTX 6000 - should be in the same ballpark).

If your training process is interruptible and can be resumed at any time (most training scripts support this), costs will drop dramatically for AWS and Google (think $50 to $200).

2) Yes. A used ~$200 Tesla K80 will do. Alternatively any NVIDIA card with at least 8 GiB of VRAM is capable of doing the job, but lower batch sizes and increased training time are to be expected. If you can use a dedicated machine with an RTX 3060 or a brand new A4000 (if you're willing to pay the premium), close to a week of training time can be achieved.

3) Yes*

*your work will be freely available to everyone and your training process is limited to 12h or so per day.

All in all I wouldn't recommend training a StyleGAN model from scratch anyway. Finetuning a pretrained model using your own dataset can be done much more quickly (think hours to a day or two) and on consumer-level hardware (I train my models on an old desktop with a GTX 1070).

[1] http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html


> Finetuning a pretrained model using your own dataset can be done much more quickly (think hours to a day or two) and on consumer-level hardware (I train my models on an old desktop with a GTX 1070).

This is interesting! Do you have some links about doing that?

My desktop computer has a GTX 1060 with 6 GB of VRAM. But hopefully I can use it for something like this.

I've only used Google Colab in the past, and only tried stuff with prompting existing models.

Would love to experiment a bit with fine-tuning models on my own datasets to get some kind of unique stuff.


It is the same transfer learning you would do with any model. StyleGAN (and StyleGAN2-ADA) provide pretrained weights for you. Just start there and train on the new dataset. The ADA github even has tools to format your dataset correctly.


> If your training process is interruptible and can be resumed at any time, costs will drop dramatically for AWS and Google (think $50 to $200)

Noob question here: how does that work? Do you run the scripts in the hosting provider's downtime, with something like a nighttime rate? Or what magic is this?


It's basically special rates for instances that can be shutdown by the provider at any time.

Google charges about 1/3rd of the usual hourly rate for such instances and AWS has a "market place" where you can bid for such instances (you name your max. price beforehand) and whenever an instance with your selected specs becomes available at that rate, you get it.

Hyperscalers like Google and AWS basically have two types of machines/VMs for rent: instances reserved for long term commitment (think months to years) and on-demand. Naturally there's peak demand times in each region (usually during business hours) followed by periods of low demand so there's heavy fluctuation.

Instead of just having their machines sitting idle while still costing money, they offer such idle resources at a heavily discounted rate, with the catch that as soon as regular demand rises again, your VM is being shut down to be offered at the normal hourly rate (you get notified so your script has some time to save its state).

It's similar to what hotels do - discounted rates are available most of the time, but whenever there's a convention in town or on national holidays, you get kicked out and prices quadruple. Only with hyperscalers prices stay the same and you simply lose your cheap resource.

The actual time at which VMs become available is pretty random. If you are ok with using multiple regions there's pretty much always instances available, though usually not for hours on end.


Quite interesting, thanks!


For yourself you can use colab. For serving it as a site single $1k gpu is fine but depends on traffic.


Most of these look like warped versions of existing characters. Maybe the dataset is too small ot something.


> Maybe the dataset is too small ot something.

Yeah, I think that might be the problem here. While there's plenty of material for real human faces, there's only so many high quality 3D cartoon characters.


I like how these faces exhibit many of the usual face gan quirks: asymmetric ears, eyes with slight nystagmus, background is some abstract blurry surreal mosaic. At least they get the hair pretty darn good? Probably because the hair is more regular to begin with.


I would love to see character names and synopsis (e.g. with GPT) - vide https://www.thiswaifudoesnotexist.net/.


Yuck… So this pixar 3D crap is what’s called “cartoons” nowadays.


Yeah, it's all the same look. Not much experimentation or creativity at all when it comes to character design, at least for human characters.


Is Pixar 3D crap?


Compared to real handdrawn cartoons? Yes. Soulless, uniform, factory-produced crap.


>factory-produced crap.

Wait until you find out how hand-drawn animation is made.


A quick Google shows that hand drawn animation has been outsourced since the 60s.


Yes. Mostly to high-volume animation sweatshops, aka "soulless, uniform factory production."


Did someone say factory-produced? I love the classical animations too, but you are aware of this right?: https://youtu.be/hjmaOj3_sKk

:^)


Looks like it was mainly Robin Hood that was taking the cheap way out. And Aristocats? I think that was during a low point of Disney animations.


Here’s a comprehensive overview, including both the full length movies that you saw in that YouTube video and a lot of shorter animations that were reused.

https://disney.fandom.com/wiki/List_of_recycled_animation_in...

Definitely something they did quite a bit more than just a couple of times.

Not saying there’s anything “wrong” with that per se btw. Just found it relevant to the discussion about comparing modern animation to factory output.


From a technical and artistic perspective, I can’t think of much to criticize. I think of us as fortunate to have such incredible accomplishments in story telling and presentation.

There are modern 3d movies I particularly dislike and many I don’t like or love. I just don’t agree that the medium is innately lacking.


Now get off my lawn!


Well, really "Pixar." Pixar at least knew how leverage their weird style; everyone else ran with it and made it god awful. But my thought exactly. Why even do this with something that's so offensively awful to look at?

That gripe aside, if you're just training on a bunch of headshots and generating new ones, it's been done, over and over at this point. Want to impress? Figure out how to generate a full sequence of coherent animation frames.


It's really easy to spot the training references in these - lots of Wreck It Ralph and Big Hero Six characters with slightly reshaped face and hair.

Edit: whenever it gets inspired by Mr Incredible the result ends up looking like Conan O'Brien.


Now train it to generate cartoons similar to a human face or a dog and take in the $$$.


https://linktr.ee/voilaaiartist Mobile app does a good job with people. I’d be surprised if it works on dogs.


Pixarize-my-pets? Sign me up!


It's like they took disney, Pixar, and DreamWorks and just mashed it all together.


Wow, the app powering this requires a minimum of at least 1 high end NVIDIA GPU with at least 12GB of GPU memory with 8 GPUs recommended. All to generate some cartoon faces using AI. At least for that website. Dang.


I think that's just the training requirements. For simple TXDNE sites, you don't need so much as a single GPU as you can pregenerate them all.


Makes sense.

I was going to set it up to mess around with until I saw the requirements.


This would be much more useful if it generated a model.


I don't see why that would be particularly difficult to accomplish. A dataset made up of 3D assets would actually give an algorithm more information to work with. The question is whether it could generate a usable model and not an Eldrich horror of disconnected triangles and non-manifold geometry.

The easiest win would probably be to have an algorithm pick between predetermined types of assets (heads, appendages, clothes, etc), reshape them without actually adding new geometry, and then doing essentially what the linked page does with skins and shaders.


At that point, you might as well just make a shape-key based character creator.


Sure, yet a shape-key character creator with something that will instantly imagine original textures and apply different design styles to the existing base meshes would be extremely useful and time-saving. You wouldn't necessarily need as many riggers or manual rigging at all because a base character rig from say Cloudy With A Chance of Meatballs could immediately be used for a completely different character design in Book Of Life just by training the algorithm on hand-drawn sketches.

I mean, that's about as close as we may get to the holonovels from Star Trek in our lifetime.

Say "computer, delete crowd and replace with cowboys" and it just does it using imagined designs consistent with the rest of the movie/game.


pixar




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: