The two classes that you describe: 1. Adversary has the weights and architecture...

jeffclune · on March 23, 2015

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another,

Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

> likely since they share similar training data (?) unclear.

The original Szegedy et al. paper shows that these sort of examples generalize even to networks trained on different subsets of the data (and with different architectures).

> Anyway, really cool work :)

Thanks. :-)

yosinski · on March 23, 2015

> Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

Good point. You could also do this with the gradient version too (fool in-house using gradients -> hopefully fool someone else's network), but the transferability of fooling examples might differ depending on how they are found.

wodenokoto · on March 24, 2015

I've quite enjoyed reading your paper since it was uploaded to arxiv in December and I have been toying with redoing the MNIST part of your experiment on various classifiers. (I'm particularly interested to see if images generated against an SVM can fool a nearest neighbor or something like that.)

But I'm having problems generating images: A top SVM classifier on MNIST has a very stable confidence distribution against noisy images. If I generate 1000 random images, only 1 or 2 of them will have confidences that are different from the median confidence distribution. That is, all the images are classified with the same confidence as class 1. They also share the same confidence for class 2, etc.

So it is very difficult to make changes that affect the output of the classifier.

Any tips on how to get started with generating the images?

jeffclune · on March 24, 2015

I would just unleash evolution. 1 or 2 in the first generation is a toehold, and from there evolution can begin to do its work. You can also try a larger population (e.g. 2000) and let it run for a while.

yosinski · on March 23, 2015

> ...in case 2 I can compute the gradient numerically, it just takes a bit longer.

Yep, true, might just take a while. On the other hand, even a very noisy estimate of the gradient might suffice, which could be faster to obtain. Perhaps someone will do that experiment soon. Maybe you could convince one of those students of yours to do this for extra credit?? ;).

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models.

Ditto x2.

> It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear.

Yeah. I wonder if the subspaces found using non-gradient based exploration end up being either larger or overlapping more between networks than those found (more easily) with the gradient. Would be another interesting followup experiment.