I think this might be a well-done satire showcasing the pitfalls of having insufficient training data. It's magnificent. Half the cats are adorable and half are, I believe, creatures from "The Thing".
Also when you use a dataset containing watermarked stock photos and the watermark is treated as a signal instead of noise :) https://i.imgur.com/O1ZRwSF.jpg
That could mean a lot of things. Are you saying that they look dead, fused with another cat, drastically younger, attacked by an interdimensional monster, cloned or are the evil alternate reality version of themselves?
I suspect that the problem is less one of insufficient training data, and more one of excessively noisy training data.
There are three of these generators that have shown up on HN in the last few days—people, cats, and anime faces—and what the other two (more successful) ones have in common is that the things they're trying to generate all have the same basic shape and structure: that of a face.
The cat images in the training data are clearly just cats from every angle. There's much less of a clear structure for the neural net to recognize and reproduce.
That it's doing as well as it is is actually kind of remarkable, given that issue.
I also got a few characters of would-be-text into a couple of images. As you said, the net was probably trained with images that included cat memes and it "learned" that meme-text means "cat".
Makes me wonder if it is possible for deep learning to create an artificial language. Supposedly Lojban's vocabulary was created by an algorithm, although how it got "mlatu" for cat is obscure.
> At least one word was found in each of the six source languages (Chinese, English, Hindi, Spanish, Russian, Arabic) corresponding to the proposed gismu. This word was rendered into Lojban phonetics rather liberally: consonant clusters consisting of a stop and the corresponding fricative were simplified to just the fricative (“tc” became “c”, “dj” became “j”) and non-Lojban vowels were mapped onto Lojban ones. Furthermore, morphological endings were dropped. The same mapping rules were applied to all six languages for the sake of consistency.
> ...
Cute idea, but about half of these have given me very weird/unrealistic results. For instance this[1] one which, while amusing as hell, isn't exactly convincing.
Fun fact, the toroidal cat is actually a stable gravitational configuration along with a spheroid. There are upper and lower bounds on angular momentum and mass of course, but within those bounds you can have a toroidal cat orbiting its primary!
Complete tangent, but I’ve always wanted to ask someone else who thinks about toroidal gravitational physics: in the center of a toroidal cat (planet) of sufficient mass, does gravity “cancel out”, or are you instead being “pulled apart” in every direction at once, as if on a medieval rack?
At the very center of the hole, everything cancels. But it is unstable. As you head in any direction, you will get pulled further. Which means that there are also tides pulling you apart.
At the center of the solid ring of the torus, everything also cancels. But this time the tides are squishing you together. Which is why the configuration is stable.
Cancels out. To pull something apart, there would have to be a gravitational gradient, meaning the object would have to be pretty huge relative to the toroid.
Actually the buttered cat hypothesis has been proven with much less exotic cat shapes. I recall seeing a paper showing that a cat with dimension n=1 and zero rotational inertia would spontaneously begin spinning as a result of the presence of a "butter field." Of course the experimenters ran out of point cats and had to go to the store to get more syrup shortly after completing the experiment, so it has been rather difficult to replicate. However the authors assured readers they would attempt replication at the next breakfast fundraiser for the department.
I wish researchers were using these examples to further understand what this network does, how it fails and what are its fundamental limitations. However, such digging would undermine the hype, so I'm not particularly hopeful. Most of the issues are just written off as kinks to be ironed out.
Another thing that really bothers me is that no one tries to replicate any of these results without neural networks[1]. To most people here this is the natural result of deep neural networks being the bestest algorithm ever. To me, this indicates that much of the current ML research fails to generate true insight.
---
[1] For example, what would GAN-like architecture look with gcForests? No one seems to care about questions like this, even though gcForests have tons of practical advantages over neural nets.
ML isn't perfect, in case you didn't know. If you want to catch up with academic progress, search for stylegan. Aren't we allowed to have some fun sometimes?
Author may want to implement a labelling method for users for a days to maybe train the discriminator a little bit better. Would be a cool human-in-the-loop exercise.
I was surprised to see that the easiest way to figure out if a face was real was by looking at the background. The face generator seems to be terrible at everything but faces. There are often strange visual artifacts and clipping issues, and the face generator never seems to put another person in the background of the picture.
I read this as "which France is real" and was slightly disappointed when I wasn't able to test my incomplete knowledge of European geography against a neural net.
ML generates some rather bad artifacts. Just look for those.
Even in this[1] difficult comparison you can see the non-human repeating skin patterns on the right and the awkward teeth contour. Also hair-on-skin often looks wet and with unnatural bends.
When comparing wrinkly people then it gets a little harder.
That one is super hard when looking only at the face.
Look at the clothes and necklace. The clothes are different on left and right sides of her face - the moment you see it you can't unsee it and it's obviously wrong.
After yesterday's 10 minutes of watching those fake faces this test was super simple for me, I did like 25 without mistakes which kinda shows the fake generation has a long way to know to fool good eyes.
You can pretty reliably guess correctly if you look for ghosting/blurring/chromatic aberration along sharp edges, e.g. around the eyes, on the chin, and in hair. ML hasn't quite mastered the fine details yet
Interesting. I looked at about 20 of these and only saw 3 that didn't look like cats. I guess I got lucky! Now to hit refresh some more times to find some weird and wonderful mutant cats.
I was about to comment that this sort of thing would be a great source of "stock" pictures for powerpoint presentations; pick a theme (subject) and hand-pick the good ones.
Raises some questions about what is able to be copyrighted vs. derived works if the generated image was produced by this algorithm and doesn't actually exist in Shutterstock's (excuse me, Shuttersrstsck's) database.
I was referring to the gray bar with text on the bottom. Its supposed to have the URL for the website rehosting image, like the bottom white bar in the image you linked to.
I still can't shake the feeling that most of these StyleGAN images are cleverly overfitting and just showing the face of an already existing cat in its training data. (But would love to be proven wrong!)
Back when I used to experiment with Markov chat simulators, this was a big problem. Besides the disappointment of finding out a particularly clever generated sentence was actually verbatim from the training set, there's also "accidental sharing" and/or "plagarism" issues. Of course with text it's pretty simple to code a check that output doesn't exactly match any known inputs. Not sure how you'd do that with images; maybe some kind of image hashing. (I wonder if you could use the neural network itself to assist in that - i.e. hash the measured values at a lower-dimensioned layer of the network rather than the raw image.)
Yes. For hysterical raisins, VGG-16 is usually used as the hash/space for the nearest-neighbor lookups. Recent example of this is in the BigGAN appendix, where you can see that despite the dog samples looking perfect, they are nevertheless totally different from their closest neighbors in the ImageNet training data and so can't be memorization.
The human face training data was probably much more uniform: all well-lit photos of human faces in the center of the photo looking directly at the camera. Whereas the training set for this was probably any photos of cats, in any lighting conditions, with the cat in any pose.
It looks to me that cats as a whole are more complex than human faces. I suspect we would get similar results with full humans. Cats have huge variation in color and pose that faces do not. An additional factor is that most portraits have the background in a separate focal plane while most cats are photographed against a complex background.
It would be interesting to see how realistic a cat thinks these are, maybe by measuring brain activity or reactions. It's possible that a cat may not be fooled by cats we think look real, or perhaps more interestingly, that a cat is fooled by a not particularly good image.
Common cuckoos lay their eggs in other birds' nests. The chicks don't necessarily look much like the host species to the human eye, but they can fool their hosts along the correct dimensions to get food from them. It's an interesting question to what degree ML algorithms trained on human dimensions could be foiled by an animal whose brain has been wired for different perceptions, or how feasible it is to train an ML algorithm on animal perception, or if it's possible to make an algorithm that successfully fools, say, both man and dog.
On the last point, for example: to make fake sounds that fool animals with different hearing ranges, presumably you have to be able to output sounds across the union of the ranges and train on sound data over the union of the ranges.
(Note: I'm not a biologist, if someone more informed wants to correct me on anything here you are welcome to do so.)
In a plenty of results, I'm getting an ok cat face with a cat-like blob attached to it. So I'd say it's difficult for the model to discern any features in a mass of color-splotched fur.
kinda funny how this GAN picked up the strong presence of cats within early 2010's memes, some of the resulting photos have remnants of the distinctive white-on-black text from some of the training data
A while back I started to think about something like gan.tv (it's an owned domain I see) for computer generated entertainment. This cat one and the person one would be example channels. I assume we're going to get pretty creative with how we use automated creativity in the near future.
On one reload, I got Grumpy Cat. The background was different from any image I can find in a quick image search, but the cat was definitely Grumpy Cat. Does that make the domain name a lie?
Well, at least after the human race is wiped out by AI, the fascination with cats will still live on in the new sentient creatures dominating the planet.