This "fast and instinctual" is very common for deep learning models.
For example, here with a friend, we were showing ConvNets seemingly-NSFW images:
https://medium.com/@marekkcichy/does-ai-have-a-dirty-mind-to... (note: ALL photos are nudity-free; yet, I advise not to watch it in your office, as people taking glimpses will think that you watch some adult content; therefore, it is metaphorically SFW, but actually might be considered not safe for work).
Almost always, classifiers are tricked. We are as well... but only at first glance. Afterward, it is evident that these are innocent images.
Though, with their multipass approach, I would expect transformers to be much better at more subtle patterns. And they are, but yet far from perfect.
> Almost always, classifiers are tricked. We are as well... but only at first glance. Afterward, it is evident that these are innocent images.
I recommend reading to the end and pondering the reveal of the mystery of The Lamp.
This is the closest I've ever seen to an image whose NSFW status flips back and forth purely depending on your "System 2" knowledge.
It also highlights we're really tackling automated NSFW detection by going after a proxy, not the real thing - the algorithms try to recognize what is depicted on a given image, whereas the true question to ask is, is that image triggering emotions we don't want our audience to experience (arousal, for porn, but others - like disgust - for different types of NSFW).
But then, I realize, perhaps it's for the better, because if someone builds an image classifier that detects induced emotions, the ad industry will use it to finally destroy everything that's good in life.
For example, here with a friend, we were showing ConvNets seemingly-NSFW images: https://medium.com/@marekkcichy/does-ai-have-a-dirty-mind-to... (note: ALL photos are nudity-free; yet, I advise not to watch it in your office, as people taking glimpses will think that you watch some adult content; therefore, it is metaphorically SFW, but actually might be considered not safe for work).
Almost always, classifiers are tricked. We are as well... but only at first glance. Afterward, it is evident that these are innocent images.
Though, with their multipass approach, I would expect transformers to be much better at more subtle patterns. And they are, but yet far from perfect.