Do you have any recently updated examples, blog posts, whatever showing that DALLE is worse than modern stable diffusion? I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing), with the downside of it being locked away and somewhat expensive. And my understanding is that stable diffusion 2.0+ is actually a step backwards in terms of quality, especially for anything involving images of humans. But as this thread acknowledges, this area is moving very quickly and my knowledge might be out of date, so definitely happy to see some updated comparisons if you have any to suggest. It feels like ever since Chat GPT came out, they haven’t been many posts about stable diffusion an image generation, they got crowded out of the spotlight.
If you want an example, go check out DALLE2 subreddit vs SD subreddit.
The former is a wasteland, the latter is more popular than r/art (despite having 1% of subscribers, it has more active users at any given moment)
If you want something ready to use for a newbee, midjourney v4 crushes DALLE2 on both prompt comprehension and the images look far more beautiful.
If you are already into art, then StableDiffusion has a massive ecosystem of alternate stylized models (many which look incredible) and LORA plugins for any concept the base model doesn't understand.
DALLE2 is just a prototype that was abandoned by OpenAI, their main business is GPTs, DALLE was just a side hustle.
Dall-E is more likely to generate an image that to some degree contains what you asked for. It also tends to produce less attractive images and is closed so you can't really tune it much. People mostly don't try to do completely whole cloth text to image generation with stable diffusion, for anything involved they mostly do image to image with a sketch or photobashed source. With controlnet and a decently photobashed base image you can get pretty much anything you want, in pretty much any style you want, and it's fast.
> I was still under the impression that DALLE was better (with better meaning the images are more likely to be what you asked for, more lifelike, more realistic, not necessarily artistically pleasing),
“Artistically pleasing” is often what people ask for.
> with the downside of it being locked away and somewhat expensive.
Those are enormous downsides. Even if DALL-E was better in some broadly relevant ways in the base model, SD’s free (gratis, at least) availability means the SD ecosystem has finetuned models (whether checkpoints or ancillary things like TIs, hypernetworks, LORAs, etc.) adapted to... lots of different purposes, and you can mix and match these to create your own models for your own specific purposes.
A web interface backed by strictly the base SD model (of any version) might lose to the same over DALL-E for uses where the set of tools in the SD ecosystem do not.
I don’t disagree about the downside of DALL-E being locked away and expensive. It’s been exciting to see the Cambrian explosion of improvement to stable diffusion since its initial release. This is how AI research should be done and it’s sad that “Open AI” is not actually open.
That being said, for a business use cases, where I want to give it a simple prompt and have a high chance of getting a good usable result, it’s not clear to me that stable diffusion is there yet. Many of the most exciting SD community results seem to be in anime and porn, which can be a bit hard to follow. I guess the use cases that I’m excited about are things like logo generators, blog post image generators, product image thumbnail generators for e-commerce, industrial design, etc.
But please prove me wrong! I’m excited for SD to be the state of the art, it’s definitely better in the long term that’s it’s so accessible. I‘m sure a good guide or blog post about what’s new in stable diffusion outside of anime generation would be an interesting read.
DALLE2 is underpowered and has never improved since they released it. The actual quality of the images is very low (literally in the sense of they have lots of artifacts) because they saved CPU time by not running enough diffusion passes.
People usually still use SD v1.5 because of the experience that people have with finetuning and merging with it. Also a lot of LoRA are trained for v1.4/1.5 models and they wouldn't work with v2.1, of course you also have incredible capability to control the generation with SD and this helps, to see some result: https://youtu.be/AlSCx-4d51U
Dalle 2 was great initially but the SD BLEW past it. I mean way way way past it. Dalle2 is like a Model T Ford and SD is a Fighter Jet. It's that different. Dalle-2 is dead already.