Extensive Comparison of Text-to-Image AI Models
we compared DALL-E, Stable Diffusion, and Midjourney on 30 prompts so that you don’t have to :)
If you like this article and are interested in AI, read our “AI Multi-Agent” Series and register at Integrail.ai to build your own!
As we are building our platform for the easy creation of AI Multi-Agents, we decided to compare leading text-to-image generative AI models on 30 different prompts. The results are not very surprising, but there are some nuances you need to be aware of. The models we compare are:
- Midjourney
- Stable Diffusion
- DALL-E 3 from OpenAI in its two different modes: “vivid” and “natural”
The latter two are available on Integrail for your usage; Midjourney unfortunately does not allow API access and unlike some other companies we treat terms of service seriously so had to generate all images by hand. That’s unfortunate — Midjourney is still the best by quite a long shot, even though for some scenarios Stable Diffusion gets close, and for a few — even DALL-E delivers interesting results.
Bonus: Grumpy cat in “Scream” by Munch as the last picture for your viewing pleasure :)
Every image in this article out of our 30 prompts is structured the same way, left-to-right, top-to-bottom: stable diffusion, DALL-E “vivid” mode, Midjourney, DALL-E “normal” mode, as shown below:
The caption for the image is always a literal prompt, e.g. above it is “Your representation of the generative text-to-image model”.
As mentioned, Midjourney is an undisputed leader, however, there were a couple of occurrences where it struggled. Next, and quite close, comes Stable Diffusion, however, Midjourney can create significantly better small detail and it shows. DALL-E really struggles with people and landscapes, for the other scenarios your mileage will vary — but see and judge for yourself!
Let’s start with cases where ALL models show similar performance (somewhat surprisingly) — as this will help you better understand the starting point for your experiments with DALL-E or Stable Diffusion (in case you don’t want to or cannot — API! — use Midjourney); then we’ll show you a couple of “epic fail” examples (starting with number 10) and will finish with an assortment of various prompts you may want to play with.
1. Felt Toys Stylization
As you can see, in this case DALL-E (right column) in fact outperforms the rest — it follows instructions exactly and is able to put all three toys in the picture, unlike SD or Midjourney, even though the lighting is still arguably better for the latter.
2. Old Book Stylization
Again, pretty close, but MJ shows better lighting and detail. This is one of the prompts from the DALL-E 3 paper.
3. Whiskey and Smoke
This one was a pleasant surprise from DALL-E as well — and can serve as a starting point for your experiments to get better results from it, apparently it deals better with detailed, atmospheric descriptions.
4. Woman Playing Cello
Another good performance from DALL-E and all but a failure from SD. In fact, MJ struggled with this specific prompt up until about version 6 (current when we write this article). This is also one of the rare cases where DALL-E is actually good with people.
5. Porcelain
Very nice performance from all 3 models, nothing to add.
6. Bathroom
Another case where very detailed description leads to very good results from DALL-E, not just the others.
7. Sushi
Kind of similar performance, but again MJ wins due to lighting and the level of detail.
8. Alien Breakfast
If you squint hard enough you can say that DALL-E is sort of ok, but MJ is so much better the longer you look at it.
9. Mischievous Ferret
Another prompt from the DALL-E 3 paper and where DALL-E at least outperforms SD.
Now let’s move to some examples where DALL-E really struggles — maybe you’ll get more luck with similar prompts if you alter them or provide more details? Do let us know if that’s the case!
10. A Redhead Portrait
MJ is great, SD is close; DALL-E, what the actual f…?! What is this horror?! No matter how we tried, we could not get nice photographic portraits from DALL-E. Can you?
11. 60s Kodachrome Sci-Fi
Amazing MJ, very good SD; DALL-E will come to you in nightmares.
12. Japanese Girl
Well, by now you got the point — DALL-E can’t do portraits. Terrible. MJ version 2 was better. Or we are doing something really, really wrong with prompts.
13. Landscapes
We tried all kinds of landscapes — the verdict is if you need to generate them, your choice is MJ. SD tries but getting photorealistic quality is a challenge. DALL-E produces something kind of unusable, unfortunately — as with portraits.
14. Japanese Ink Stylization
Again, MJ and SD are at least interesting, DALL-E is a 4-year-old kid’s drawing.
15. Hands
Everyone’s favorite laughing point — hands and AI. MJ has mastered this finally in version 6. SD still struggles with counting. DALL-E — well, you can’t play piano like this and the “normal” mode version has to see the doctor for arthritis.
16. Helmut Newton
I mean, ok, beauty is in the eye of the beholder, but MJ and SD are still much better in capturing the style.
17. Monet and Penguins
Take your pick.
18. Motorboat
19. Jewelry
20. Underwater Swimming
21. Vintage Commercial
22. Chef
23. RPG Map
Generating art for games is a popular GenAI application. SD shows promise, especially taking into account you can fine-tune it (we’ll discuss how in one of the next articles).
24. RPG User Interface
Can’t beat MJ out of the box.
25. Fantasy Landscape
26. Children’s Book Illustration
27. Fantasy Skunk
Another prompt from DALL-E 3 paper, where MJ is just next level in detail.
28. Another Illustration
Another DALL-E 3 paper prompt.
29. As Promised, Grumpy Cat
That’s it for today! Judge for yourself, but we think that while DALL-E 3 performs well for some scenarios, safer API choice is Stable Diffusion, while Midjourney remains an undisputed leader.
If you like this work, do subscribe to our updates or join Integrail and start building your own AI Agents and experiment with tens of other GenAI models!