Extensive Comparison of Text-to-Image AI Models

we compared DALL-E, Stable Diffusion, and Midjourney on 30 prompts so that you don’t have to :)

Anton Antich
Superstring Theory
7 min readMay 25, 2024

--

If you like this article and are interested in AI, read our “AI Multi-Agent” Series and register at Integrail.ai to build your own!

As we are building our platform for the easy creation of AI Multi-Agents, we decided to compare leading text-to-image generative AI models on 30 different prompts. The results are not very surprising, but there are some nuances you need to be aware of. The models we compare are:

  • Midjourney
  • Stable Diffusion
  • DALL-E 3 from OpenAI in its two different modes: “vivid” and “natural”

The latter two are available on Integrail for your usage; Midjourney unfortunately does not allow API access and unlike some other companies we treat terms of service seriously so had to generate all images by hand. That’s unfortunate — Midjourney is still the best by quite a long shot, even though for some scenarios Stable Diffusion gets close, and for a few — even DALL-E delivers interesting results.

Bonus: Grumpy cat in “Scream” by Munch as the last picture for your viewing pleasure :)

Every image in this article out of our 30 prompts is structured the same way, left-to-right, top-to-bottom: stable diffusion, DALL-E “vivid” mode, Midjourney, DALL-E “normal” mode, as shown below:

Your representation of the generative text-to-image model

The caption for the image is always a literal prompt, e.g. above it is “Your representation of the generative text-to-image model”.

As mentioned, Midjourney is an undisputed leader, however, there were a couple of occurrences where it struggled. Next, and quite close, comes Stable Diffusion, however, Midjourney can create significantly better small detail and it shows. DALL-E really struggles with people and landscapes, for the other scenarios your mileage will vary — but see and judge for yourself!

Let’s start with cases where ALL models show similar performance (somewhat surprisingly) — as this will help you better understand the starting point for your experiments with DALL-E or Stable Diffusion (in case you don’t want to or cannot — API! — use Midjourney); then we’ll show you a couple of “epic fail” examples (starting with number 10) and will finish with an assortment of various prompts you may want to play with.

1. Felt Toys Stylization

The lovely world of wool felt, a mouse, a crocodile and a giraffe are having tea with strawberry pie, realistic hyper-detail, soft-focus, chibi, Tilt-shift ,super lighting, volumetrics, Jon Klassen, in focus,80mm lens, Large aperture, light colors

As you can see, in this case DALL-E (right column) in fact outperforms the rest — it follows instructions exactly and is able to put all three toys in the picture, unlike SD or Midjourney, even though the lighting is still arguably better for the latter.

2. Old Book Stylization

Ancient pages filled with sketches and writings of fantasy beasts, monsters, and plants sprawl across an old, weathered journal. The faded dark green ink tells tales of magical adventures, while the high-resolution drawings detail each creature’s intricate characteristics. Sunlight peeks through a nearby window, illuminating the pages and revealing their timeworn charm.

Again, pretty close, but MJ shows better lighting and detail. This is one of the prompts from the DALL-E 3 paper.

3. Whiskey and Smoke

old wooden table, a shot of whiskey stands on the table, dirty window in the background, dim light, cigarette smoke

This one was a pleasant surprise from DALL-E as well — and can serve as a starting point for your experiments to get better results from it, apparently it deals better with detailed, atmospheric descriptions.

4. Woman Playing Cello

woman in the red dress playing cello

Another good performance from DALL-E and all but a failure from SD. In fact, MJ struggled with this specific prompt up until about version 6 (current when we write this article). This is also one of the rare cases where DALL-E is actually good with people.

5. Porcelain

exquisitely beautiful porcelain coffee cup with golden and blue dragons engraving

Very nice performance from all 3 models, nothing to add.

6. Bathroom

this luxurious bathroom features a modern freestanding bathtub in a crisp white finish. the tub sits against a wooden accent wall with glass-like panels, creating a serene and relaxing ambiance. three pendant light fixtures hang above the tub, adding a touch of sophistication. a large window with a wooden panel provides natural light, while a potted plant adds a touch of greenery. the freestanding bathtub stands out as a statement piece in this contemporary bathroom.

Another case where very detailed description leads to very good results from DALL-E, not just the others.

7. Sushi

A close-up advertising shot of a beautiful sushi-set on the wooden table

Kind of similar performance, but again MJ wins due to lighting and the level of detail.

8. Alien Breakfast

a well dressed alien eating full English breakfast at a typical American diner, reporting photography, newspaper shot

If you squint hard enough you can say that DALL-E is sort of ok, but MJ is so much better the longer you look at it.

9. Mischievous Ferret

A mischievous ferret with a playful grin squeezes itself into a large glass jar, surrounded by colorful candy. The jar sits on a wooden table in a cozy kitchen, and warm sunlight filters through a nearby window

Another prompt from the DALL-E 3 paper and where DALL-E at least outperforms SD.

Now let’s move to some examples where DALL-E really struggles — maybe you’ll get more luck with similar prompts if you alter them or provide more details? Do let us know if that’s the case!

10. A Redhead Portrait

A close-up photographic portrait of a beautiful redhead woman with electric blue eyes.

MJ is great, SD is close; DALL-E, what the actual f…?! What is this horror?! No matter how we tried, we could not get nice photographic portraits from DALL-E. Can you?

11. 60s Kodachrome Sci-Fi

photo of futuristic steampunk blond pin-up with a robot ,clear facial features, minimalist interiors,1967, kodachrome 25

Amazing MJ, very good SD; DALL-E will come to you in nightmares.

12. Japanese Girl

an Japanese girl, cinematic lighting

Well, by now you got the point — DALL-E can’t do portraits. Terrible. MJ version 2 was better. Or we are doing something really, really wrong with prompts.

13. Landscapes

magnificent nature landscape photography, wide Siberian Yenisey river flowing between high mountains, northern beauty, evergreen trees on the shores, sunset reflections in the river, ultra realistic hd render

We tried all kinds of landscapes — the verdict is if you need to generate them, your choice is MJ. SD tries but getting photorealistic quality is a challenge. DALL-E produces something kind of unusable, unfortunately — as with portraits.

14. Japanese Ink Stylization

Japanese ink drawing of a beautiful female elf with a bow

Again, MJ and SD are at least interesting, DALL-E is a 4-year-old kid’s drawing.

15. Hands

close up photo of a woman’s hands playing piano keyboard

Everyone’s favorite laughing point — hands and AI. MJ has mastered this finally in version 6. SD still struggles with counting. DALL-E — well, you can’t play piano like this and the “normal” mode version has to see the doctor for arthritis.

16. Helmut Newton

A Helmut Newton street photograph. A beautiful young woman standing on the sidewalk in New York street, looking right into the camera.

I mean, ok, beauty is in the eye of the beholder, but MJ and SD are still much better in capturing the style.

17. Monet and Penguins

A painting in the style of Monet. Three penguins skating by the Notre dam de Paris in Paris.

Take your pick.

18. Motorboat

A motorboat (speedboat) driving in the blue ocean, overlooking Angle

19. Jewelry

exquisite platinum ring with large sapphire stone and lots of small yellow Diamonds, exquisite jewelry design, very detailed, close up photo

20. Underwater Swimming

woman is swimming underwater, underwater photography, rays of light, realistic, photo, ultra detail

21. Vintage Commercial

vintage poster with old man (white hair and mustache) on it promoting his can of beans, photo realistic, old vinatge letters, text on poster reading gottfriends sauce

22. Chef

realistic shot of a Culinary chef cooking in a kitchen,soft lighting, dark background, still life photography, minimalist style, closeup shots, commercial advertising, high-definition images, rich details, exquisite textures, high saturation, symmetrical composition in the style of minimalism.

23. RPG Map

medieval fantasy rpg map, very detailed, mountains, rivers, sea in the west, three big towns, lots of villages, beautiful detail

Generating art for games is a popular GenAI application. SD shows promise, especially taking into account you can fine-tune it (we’ll discuss how in one of the next articles).

24. RPG User Interface

flat, minimal, simple mobile game UI, character and six weapon slots around the character on top half part of screen, weapon inventory on bottom half of screen, tab bar menu on bottom of screen

Can’t beat MJ out of the box.

25. Fantasy Landscape

Imagine a fantastical night sky adorned with countless twinkling stars and a magnificent, luminous full moon. The moon’s ethereal glow bathes the landscape in a soft, magical light, casting gentle shadows and highlighting the serene beauty of the surroundings. The stars sparkle like diamonds, creating a breathtaking celestial tapestry that evokes a sense of wonder and awe. This enchanting scene captures the serene and mystical essence of a night under the cosmos, where the moon and stars come together to create a moment of pure, otherworldly beauty

26. Children’s Book Illustration

by Children’s picture book illustration, an ancient Chinese man, holding a white duck’s neck in his right hand, is happily walking on the streets of ancient China, cartoon style

27. Fantasy Skunk

In a fantastical setting, a highly detailed furry humanoid skunk with piercing eyes confidently poses in a medium shot, wearing an animal hide jacket. The artist has masterfully rendered the character in digital art, capturing the intricate details of fur and clothing texture

Another prompt from DALL-E 3 paper, where MJ is just next level in detail.

28. Another Illustration

A illustration from a graphic novel. A bustling city street under the shine of a full moon. The sidewalks bustling with pedestrians enjoying the nightlife. At the corner stall, a young woman with fiery red hair, dressed in a signature velvet cloak, is haggling with the grumpy old vendor. the grumpy vendor, a tall, sophisticated man is wearing a sharp suit, sports a noteworthy moustache is animatedly conversing on his steampunk telephone.

Another DALL-E 3 paper prompt.

29. As Promised, Grumpy Cat

grumpy cat in Scream painting by Edvard Munch

That’s it for today! Judge for yourself, but we think that while DALL-E 3 performs well for some scenarios, safer API choice is Stable Diffusion, while Midjourney remains an undisputed leader.

If you like this work, do subscribe to our updates or join Integrail and start building your own AI Agents and experiment with tens of other GenAI models!

--

--

Anton Antich
Superstring Theory

How to scale startups and do AI and functional programming. Building Integrail.ai: pragmatic AGI platform. Built Veeam from 0 to 1B in revenue in under 10 years