Hacker News new | past | comments | ask | show | jobs | submit login

The two classes that you describe: 1. Adversary has the weights and architecture and 2. Adversary can only do forward pass and observe output, are equivalent when all you're trying to do is compute the gradient on the data. In case 1 I use backprop, in case 2 I can compute the gradient numerically, it just takes a bit longer. Your stochastic search speeds this up.

Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear. Anyway, really cool work :)




> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models. It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another,

Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

> likely since they share similar training data (?) unclear.

The original Szegedy et al. paper shows that these sort of examples generalize even to networks trained on different subsets of the data (and with different architectures).

> Anyway, really cool work :)

Thanks. :-)


> Agreed. That is surprising, and also increases the security risks, because I can produce images on my in-house network and then take them out into the world to fool other networks without even having access to the outputs of those networks.

Good point. You could also do this with the gradient version too (fool in-house using gradients -> hopefully fool someone else's network), but the transferability of fooling examples might differ depending on how they are found.


I've quite enjoyed reading your paper since it was uploaded to arxiv in December and I have been toying with redoing the MNIST part of your experiment on various classifiers. (I'm particularly interested to see if images generated against an SVM can fool a nearest neighbor or something like that.)

But I'm having problems generating images: A top SVM classifier on MNIST has a very stable confidence distribution against noisy images. If I generate 1000 random images, only 1 or 2 of them will have confidences that are different from the median confidence distribution. That is, all the images are classified with the same confidence as class 1. They also share the same confidence for class 2, etc.

So it is very difficult to make changes that affect the output of the classifier.

Any tips on how to get started with generating the images?


I would just unleash evolution. 1 or 2 in the first generation is a toehold, and from there evolution can begin to do its work. You can also try a larger population (e.g. 2000) and let it run for a while.


> ...in case 2 I can compute the gradient numerically, it just takes a bit longer.

Yep, true, might just take a while. On the other hand, even a very noisy estimate of the gradient might suffice, which could be faster to obtain. Perhaps someone will do that experiment soon. Maybe you could convince one of those students of yours to do this for extra credit?? ;).

> Likewise, I was not very surprised that you can produce fooling images, but it is surprising and concerning that they generalize across models.

Ditto x2.

> It seems that there are entire, huge fooling subspaces of the input space, not just fooling images as points. And that these subspaces overlap a lot from one net to another, likely since they share similar training data (?) unclear.

Yeah. I wonder if the subspaces found using non-gradient based exploration end up being either larger or overlapping more between networks than those found (more easily) with the gradient. Would be another interesting followup experiment.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: