I still can't shake the feeling that most of these StyleGAN images are cleverly overfitting and just showing the face of an already existing cat in its training data. (But would love to be proven wrong!)
Back when I used to experiment with Markov chat simulators, this was a big problem. Besides the disappointment of finding out a particularly clever generated sentence was actually verbatim from the training set, there's also "accidental sharing" and/or "plagarism" issues. Of course with text it's pretty simple to code a check that output doesn't exactly match any known inputs. Not sure how you'd do that with images; maybe some kind of image hashing. (I wonder if you could use the neural network itself to assist in that - i.e. hash the measured values at a lower-dimensioned layer of the network rather than the raw image.)
Yes. For hysterical raisins, VGG-16 is usually used as the hash/space for the nearest-neighbor lookups. Recent example of this is in the BigGAN appendix, where you can see that despite the dog samples looking perfect, they are nevertheless totally different from their closest neighbors in the ImageNet training data and so can't be memorization.