I think that's all fair that both LMMs and and people get a certain (even unboun...

I think that's all fair that both LMMs and and people get a certain (even unbounded) amount of "pretraining" before actual tasks.

But after the training people are much more equipped to do single-shot recognition and cognitive tasks of imagery and situations they have not encountered before, e.g. identifying (from pictures) which animals is being shown, even if it is the second time of seeing that animal (the first being shown that this animal is a zebra).

So, basically, after initial training, I believe people are superior in single-shot tasks—and things are going to get much more interesting once LMMs (or something after that?) are able to do that well.

It might be that GPT-4o can actually do that task well! Someone should demo it, I don't have access. Except, of course, GPT-4o already knows what zebras look like, so something else than exactly that..