Yes, I don't really see impressive language (i.e. GPT3) results here? It seems to morph the images of the nouns in the prompt in an aesthetically-pleasing and almost artifact-free way (very cool!).
But it does not seem 'understand' anything like some other commenters have said. Try '4 glasses on a table' and you will rarely see 4 glasses, even though that is a very well-defined input. I would be more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."
I think some of these examples trigger some kind of bias, where we think: "Oh wow, that armchair does look like an avocado!" - But morphing an armchair and an avocado will almost always look like both because they have similar shapes. And it does not 'understand' what you called 'object concepts', otherwise it should not produce armchairs where you clearly cannot sit in due to the avocado stone (or stem in the flower-related 'armchairs').
What I meant is that 'not' is in principal an easy keyword to implement 'conservatively'. But yes, having this in a language model has proven to be very hard.
Edit: Can I ask, what do you find impressive about the language model?
Perhaps the rest of the world is less blasé - rightly or wrongly. I do get reminded of this: https://www.youtube.com/watch?v=oTcAWN5R5-I when I read some comments. I mean... we are telling the computer "draw me a picture of XXX" and it's actually doing it. To me that's utterly incredible.
I'm in the open ai beta for GPT-3, and I don't see how to play with DALL-E. Did you actually try "4 glasses on a table"? If so, how? Is there a separate beta? Do you work for open ai?
But it does not seem 'understand' anything like some other commenters have said. Try '4 glasses on a table' and you will rarely see 4 glasses, even though that is a very well-defined input. I would be more impressed about the language model if it had a working prompt like: "A teapot that does not look like the image prompt."
I think some of these examples trigger some kind of bias, where we think: "Oh wow, that armchair does look like an avocado!" - But morphing an armchair and an avocado will almost always look like both because they have similar shapes. And it does not 'understand' what you called 'object concepts', otherwise it should not produce armchairs where you clearly cannot sit in due to the avocado stone (or stem in the flower-related 'armchairs').