Hacker News new | past | comments | ask | show | jobs | submit login
Generating faces of cats using Generative Adversarial Networks (ajolicoeur.wordpress.com)
126 points by gk1 on July 6, 2017 | hide | past | favorite | 55 comments



With this kind of task, how do you verify that you didn't just overfit and start reproducing the input data?


She's generating it from noise: https://github.com/AlexiaJM/Deep-learning-with-cats/blob/mas...

Also, you could verify by writing unit tests with OpenCV to look for similar sources. Since it's all headshots, it will find matches for sure, but it would also find with human faces.


The neural network is starting from noise, but that's not the only input, it was trained on [0] and I think it's arguable that the NN is "reproducing" the images from its training dataset in some sense.

[0] https://web.archive.org/web/20150703060412/http://137.189.35...


I think it's a very interesting question, of how can we measure when a neural network is being creative? In fact, creativity is not obvious at all. It's sort of an ill-posed question if you think about it. How can you verify that a network is generating things that are not like what it was trained on, yet are... like what it was trained on?

Are neural networks* forever relegated to the role of copying and interpolation? Do the neural network weights form a kind of database?

* (I don't think this only applies to neural networks, but models in general)

There was one recent work trying to address this [1] but I'm not 100% convinced and I think a lot more work is warranted in this area. A difficulty is that it's not a purely technical problem, but also one of semantics and interpretation. It's one that the "automatic musical accompaniment" community and other digital arts communities have struggled with for decades, and it's not resolved.

How do you know when a machine is being creative? It's not far from the moving goalposts problem of general artificial intelligence. How do you know when a machine is being intelligent, if you can always explain it away by examining the black box?

[1]: https://arxiv.org/abs/1706.07068


The best one can hope for from a NN is that it discerns a model within the training data. There is a way to more-or-less onjectively measure how well it has done this, if at all: if the model requires less information than the data it explains. i.e. fewer bytes. So, "compression algorithms" are a rudimentary model of data; we'd like to do much better than that.

However, NN tend to not be very space-efficient, and also don't usually "explain" the data (in the sense of reproducing it). So this test is hard to apply to them.

BTW: human creativity has much to do with expectation: how obvious it was to you already. So, people with different levels of exposureto some art discipline have different opinions on creativity... and as new styles become known, those opinions change.

Human beings also draw on other fields and experiences, not available in training data. Especially striking, to humans, is inspiration from common experiences that are not recognised as common, as in art that reveals ourselves to us; observational humour. For a computer to use this information, it seems it would need to have human experiences, a body, social interaction etc. Of course, this is a very parochial concept... pure creativity need not be so anthropocentric.


And likewise, how do you know when a human is being creative? Isn't all art derivative of our training and influences? I believe something like that was an argument by one of the random paint splatter artists: that randomness was the only thing truly creative.


Pollock?


Why is Jason Pollock now offensive?

https://en.wikipedia.org/wiki/Jackson_Pollock

Bunch of Avantgarde denialists!

If its not a painted photo, its not art? If i dont understand it, its not art? If its meta its not art?


There is a popular conspiracy theory than nobody actually understands it and avantgarde artists are in fact con artists.

Not that I care enough to be offended, though.

And hell, if this is that "meta" meaning in their art, I must confess that I haven't got the joke until now :)


Yep, this is a current area of research for content generation.

I think most current approaches build some transform to a latent space and then compare generated images with their nearest neighbors in the training set. If they're identical then your network just learned to reproduce the dataset.


Yeah, I would have at least ran some kind of similarity search on the output. Without that check it's impossible to know if this is actually doing anything.


Similar thoughts: How do I verify what I read isn't just technobabble haha. I don't think I understood much of it


I was expecting meows.


Yeah, that would have been more interesting.

This is technically a meow generator generator.



not meows, but here are some purrs: https://purrli.com/


It's a shame that the meows in the "meow-y" setting are just regular meows, not purry meows (as in, meowing and purring simultaneously).



OP's post was amazing but can someone explain how this was probably made?


I think the sound is generated locally. Check out the page source, which contains the JS I think is responsible for generation (I haven't checked in detail).


captive feline


Meow-generator when the last slider is set to Meow-y ;-)


If it's meows you want, have I got the album for you!

http://pitchfork.com/reviews/albums/21145-meow-the-jewels/


I too was expecting meow purr audio content generation when I clicked into the link haha.


I'll admit I too was expecting cat-centric "speech" synthesis.


Did you try clicking on the cat faces to see if they meowed? Not a proud confession here.

BTW this is a cool project even without the audio.


Wait, it's actually generating new cat faces as in cats that don't exist? Some of those images looked like they had backgrounds in the corners was that also generated???


Yes.


These things are always low resolution. At some point I'd like to see the state of the art move into more realistic (say ~500x500) dimensions.


It's a combination of GPU RAM, slowdowns (remember it's squared in dimensions), and stability (larger is more unstable end-to-end). Arguably, state of the art in image synthesis is DeepMind's PixelCNN: "Parallel Multiscale Autoregressive Density Estimation" https://arxiv.org/abs/1703.03664 , Reed et al 2017: generating 512px photorealistic images & video with PixelCNNs rather than GANs. Also good is StackGAN which does ~200x200ish but there's no reason it couldn't go up to 500x500 (just pop in a third upscaling stage).

There's far more work on GANs than PixelCNNs (see the https://github.com/hindupuravinash/the-gan-zoo ) but at least thus far, I haven't seen any GANs which appear visually competitive with Reed et al 2017's PixelCNN samples. Downside - code has not been released by DeepMind[], and you can't do CycleGAN or other stupid GAN tricks with PixelCNN AFAIK. CycleGAN is absolutely hilarious, if you haven't seen all the uses of it yet, much more interesting than generating cat faces.

[] I asked way back when and Reed said he'd try but nothing yet.


I think it's in part due to the content loss piece being done using pre-trained imagenet models which typically resize images to 224x224...


That's not it. It's easy to scale a larger image down to 224x224 and feed it into a checkpoint. And a lot of these GANs don't use such content losses in the first place because it adds complexity and makes it harder to use (have to get one of those pretrained models in the first place).


What are some potential applications of this outside of toy problems like cat pictures?


Generating convincing profile pictures for thousands of fake social media accounts.

Or photo portraits of the "board members" for the About Us page of an autonomous corporation.


I expect it could stretch a game's art budget considerably, FWIW...


generating tween frames for animation


welp, that's terrifying.


A dream of cat video clickfarmers came true


Change the title? Seems like I wasn't the only one expecting some sort of audio-related thing.


Yeah, I put on headphones for this. Seems like a cat generator.


I would assume that the name was chosen, because I already have an old project online called "cat generator" [0], which used the same underlying technique (GANs), dataset and landmark-based normalization. Reusing the name would have resulted in confusion.

[0] https://github.com/aleju/cat-generator


Yet another cat generator then (:


Alright, we've updated the title from “Meow Generator”.


[flagged]


I gave it a try on anime images with 64px/128px WGANs for about a month back in March. No, it's not really feasible yet. GANs need restricted datasets; anime girl or cat faces, yes, anime girls in general, no, it never learns effectively. They need either more supervision (I thought StackGAN could probably handle it if you could feed in Danbooru tags) or better algorithms (PixelCNN? see my other comment, but the Reed et al 2017 samples are great despite tremendous diversity of images). Plus more GPUs.


Please don't do this here.


Ignoring the joke, this is actually interesting question to ask. I mean, yeah, there are some pretty scary, uncanny images of cats here, but some cats look almost… fine? So if these cat images are "creative" enough — this is almost a success.

But if you take a pencil and try to draw a cat yourself (assuming you are not a good artist) you have much higher chances to actually draw something "cute", than if you'd try to draw a woman. Human females look much more familiar, and there's something much trickier and more intimate to what you recognize as "cute" or "beautiful" in a human, than in a cat.

So, I'm pretty sure this NN would fail, but it's interesting what's actually required for it to not fail.


Someone did this, kinda, by inverting Yahoo's nudity detection network. The results were, uh, unsettling. https://news.ycombinator.com/item?id=12756462


Why not? Because of extreme nipple fear?

It would most likely be the #1 application of this technique if it worked in general.


No, it's because the cost it inflicts on the commons exceeds the benefit of an adolescent sex joke. If we get lots of the the latter, it will turn off and drive away some of the people we want here. Then more good users may start to leave as more good users start to leave—a vicious circle which it's basically our #1 job to (try to) make sure HN doesn't get caught in. So we're a bit hypervigilant about this.


Sounds like a great way to create even more unrealistic standards for beauty. Just a thought.


So a Minecraft for adults? Automated content generation?


Still no cure for cancer.


If that's what you're interested, a quick google search will show you plenty of startups/companies applying machine learning/deep learning to medicine.


It's just a saying.

Usually said after someone shows off a highly complicated technology with no practical purpose.


Read that as "Generating the feces of cats.." which would have been altogether more interesting a problem.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: