Putting Dall-E 3 through its paces against Midjourney (over a dozen tests)

14 min readOct 13, 2023

Dall-E was what got me into AI art, and version 3 of it has been making waves in the AI art world. It’s available for free via Bing, and apparently it uses GPT, which I’m also making heavy use of. So should I switch back? Rather than just making this decision on instinct, I decided to put it through some objective tests.

Any head-to-head I do of Midjourney vs. another art generator is going to suffer from a problem: I’ve spent a lot of time getting good at engineering prompts to run well in Midjourney. So if I simply copy and paste them from Midjourney into another AI and compare the outputs, then Midjourney’s going to have an advantage, because I won’t have tweaked the prompts to run well in the other generator. Nor will I even know how to do so.

However Dall-E 3 is claimed to use GPT to make prompt engineering unnecessary. Apparently all I need to do is input the prompts in natural language, and it will spit back what I want. So if it’s meant to do this, then I can test that claim. I settled on the following procedure:

Go back through my Midjourney creations and find an image I created just by entering text, without inpainting or outpainting etc.
Back-translate the prompt from #1 into natural language, and run it in Dall-E 3.
If I inpainted another element onto the creation in #1, then also record what Midjourney gave me.
If I did #3, then add instructions for the relevant element into the Dall-E 3 prompt, and run it.
Repeat #3 and #4 as many times as I inpainted in more elements.
Compare everything Midjourney and Dall-E 3 created in the above steps.
Repeat for another Midjourney creation

Of course my choice of tests is influenced by what I create for work and for fun, but I’ve tried to span a variety of genres, like I did for the test of Midjourney vs. Jasper Art.

1st Test: Digital Pop Art

1 ) Midjourney prompt: digital pop art, whimsical portrayal of a grand Taco_Bell restaurant, playful neon_colors, mysterious_shadows, focal point on the grand_entrance — s 50 — style raw (underscores in the prompt for reasons described here)

2 ) Dall-E 3 prompt: digital pop art whimsically portraying a grand Taco Bell restaurant in playful neon colors, mysterious shadows, focal point on the grand entrance

Midjourney’s getting something approximating the logo in one image, but Dall-E 3 is getting that it every image plus text in one image. AI art generators have been notorious for fall apart when it comes to generating text, but it looks like Dall-E 3 has made a breakthrough here.

2nd Test: Anime Acropolis

1 ) Midjourney prompt : the Acropolis covered in deep drifts of snow, clear blue sky, anime — version niji — style scenic — s 750

2 ) Dall-E 3 prompt : the Acropolis covered in deep drifts of snow, clear blue sky, scenic anime

Why did Dall-E 3 only give me 3 images? Anyway, I’d call this round a tie.

3rd Test: Encaustic Art on Glass

Time to test a more complex piece, that I put together bit by bit in Midjourney with 5 stages on inpainting. Dall-E 3 lacks that function, but could it do it in one go using only text prompts?

Midjourney prompt: clockwise yin_yang | logarithmic_spiral and chaotic fractal_patterns in translucent_layers of glossy carnauba_encaustic on backlit_glass — s 50 — style raw

2 ) Dall-E 3 Prompt: clockwise yin yang, logarithmic spiral and chaotic fractal patterns in translucent layers of glossy carnauba encaustic on backlit glass

Dall-E 3 was getting the yin yang more consistently.

3 ) Midjourney prompt (inpainted on #1) : deep_teal whirlpool in translucent_layers of glossy carnauba_encaustic on backlit_glass | logarithmic_spiral with chaotic fractal_patterns — s 750 — style raw

4 ) Dall-E 3 prompt: clockwise yin yang with a deep teal whirlpool in the black area, translucent layers of glossy carnauba encaustic on backlit glass with logarithmic spiral and chaotic fractal patterns

Really cool images, but it doesn’t really seem to be getting what I want it to do. GPT’s natural language understanding is great, but its integration with Dall-E leaves much to be desired.

5 ) Midjourney prompt (inpainted on #3): green sequoia_leaves in embedded_encaustic, translucent_layers of glossy carnauba_encaustic on backlit_glass — s 750 — style raw

6 ) Dall-E 3 prompt: clockwise yin yang with a deep teal whirlpool in the black area, green sequoia leaves embedded in translucent layers of glossy carnauba encaustic on backlit glass with logarithmic spiral and chaotic fractal patterns

Embedded encaustic art is one area where both Midjourney and Dall-E 3 fall apart.

7 ) Midjourney prompt (inpainted onto #5): banded gneiss, dimly lit — s 750 — style raw

8 ) Dall-E 3 prompt: banded gneiss surrounding a circular pane of backlit glass with a glossy carnauba encaustic artwork ofclockwise yin yang with a deep teal whirlpool in the black area, green sequoia leaves embedded in translucent encaustic layers, logarithmic spiral and chaotic fractal patterns

Here Dall-E 3 seems to be following the prompt more precisely.

9 ) Midjourney prompt (inpainted on #7 after multiple touch ups) : malachite_green aurora_arcs on floor_mosaic — s 750 — style raw — no reflections

10 ) Dall-E 3 Prompt : wall of banded gneiss surrounding a circular pane of backlit glass with a glossy carnauba encaustic artwork of clockwise yin yang with a deep teal whirlpool in the black area, green sequoia leaves embedded in translucent encaustic layers, logarithmic spiral and chaotic fractal patterns, with a floor mosaic of malachite green aurora arcs

With no inpainting, I’m being forced to stuff everything into a single prompt in Dall-E 3, and the bot is approaching the limits of its bandwidth (which is admittedly much higher than Midjourney’s).

4th Test: Coconut Palms in the Japanese Alps?

1 ) Midjourney prompt: coconut palms growing in the snow in the Japanese_Alps, anime — version niji — style scenic — s 750

2 ) Dall-E 3 prompt : coconut palms growing in the snow in the Japanese Alps, scenic anime

Well this sucks! It’s beyond me how this is “Unsafe Image Content”.

5th Test: Anime Party Scene

If Dall-E 3 gives false positives for risque content, let’s see if it will give false negatives ...

1 ) Midjourney Prompt: Japanese woman with devil horns, wearing a latex catsuit, cutting a birthday cake, in a bar with a carpeted floor, green_walls, brown_sofas, glass_cabinets, anime — version niji — style original — s 750

2 ) Dall-E 3 prompt: Japanese woman with devil horns, wearing a latex catsuit, cutting a birthday cake, in a bar with a carpeted floor, green_walls, brown_sofas, glass_cabinets, anime — version niji — style original — s 750

Why does Dall-E 3 give me the option to report a false positive for the more risque prompt, but not the one of coconut palms? Anyway it seems clear that Dall-E 3 is off the table for everyone who wants to create anime waifus.

6th Test: Cirque Jungle Tunnels

This was another step-by-step inpainting construction in Midjourney, that I blogged about in my last post.

1 ) Midjourney prompt: details of Transantarctic_Mountains sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — no snow, ice, icecap, glaciers — s 50 — style raw — ar 16:9

2 ) Dall-E 3 prompt: an aerial photograph with a 16:9 landscape aspect ratio, shot by a Sony A7R II with a wide angle lens, showing the details of the Transantarctic Mountains sidelit by low angle sunlight, with a high horizon and desaturated cold tones, with no snow, no ice, no icecap, and no glaciers

Dall-E 3 falls over at first hurdle! I could live with the image coming back square (since it’s easy to crop to a landscape aspect ratio), but it was completely ignoring my instructions not to add in snow, ice, an icecap or glaciers! Changing “no” to “without” yielded similar results. But maybe it would work if I specified something else to put on the mountains instead of snow. I kept going …

3 ) Midjourney prompt (inpainted on #1): wild subtropical_cloudforest with tree_ferns and palm_trees on Transantarctic_Mountains sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — no snow, ice, icecap, glaciers — s 50 — style raw — ar 16:9

4 ) Dall-E 3 prompt: an aerial photograph shot by a Sony A7R II with a wide angle lens, of wild subtropical cloudforest with tree ferns and palm trees on the Transantarctic Mountains sidelit by low angle sunlight, with a high horizon and desaturated cold tones, no snow, no ice, no icecap, no glaciers

OK so it looks like Dall-E 3 actually can do negative prompting as long as you specify something to replace whatever you’re deleting. It’s replacing the snow now. And I particularly like the top two images. Though it’s taken at a pretty low altitude for an “aerial photograph”.

5 ) Midjourney prompt (inpainted onto #3): <URL of Image of Rice Terraces>:: rice_terraces on the Transantarctic_Mountains towering out of a cloud_inversion sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — no snow, ice, icecap, glaciers — s 50 — style raw — ar 16:9

6 ) Dall-E 3 prompt: an aerial photograph shot by a Sony A7R II with a wide angle lens, of rice terraces on the Transantarctic Mountains covered by wild subtropical cloudforest with tree ferns and palm trees, towering out of a cloud inversion, sidelit by low angle sunlight, with a high horizon and desaturated cold tones, no snow, no ice, no icecap, no glaciers

Not too bad for Dall-E 3! Merely with text it could consistently make a scene that was too complex to render in Midjourney without inpainting.

7 ) Midjourney prompt: <URL of an Image with>:: cavates in a cliff_face like Uplistsikhe or the Anasazi cliff_dwellings, sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — s 50 — style raw — no railings — ar 16:9

8 ) Dall-E 3 Prompt: an aerial photograph shot by a Sony A7R II with a wide angle lens, of cavates like Uplistsikhe or the Anasazi cliff dwellings, in the face of a cliff in the Transantarctic Mountains covered by rice terraces and wild subtropical cloudforest with tree ferns and palm trees, towering out of a cloud inversion, sidelit by low angle sunlight, with a high horizon and desaturated cold tones, no railings, no snow, no ice, no icecap, no glaciers

I love the shape of the landscapes, but I really would have preferred to have some rice terraces on top of the cliff as well. Unfortunately Dall-E 3 doesn’t support inpainting.

9 ) Midjourney prompt (inpainted on #7) : limestone_columns, cavates in a cliff_face like Uplistsikhe or the Anasazi cliff_dwellings, sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — s 750 — style raw — no railings — ar 16:9

10 ) Dall-E 3 Prompt : an aerial photograph shot by a Sony A7R II with a wide angle lens, of cavates like Uplistsikhe or the Anasazi cliff dwellings, with limestone columns, in the face of a cliff in the Transantarctic Mountains covered by rice terraces and wild subtropical cloudforest with tree ferns and palm trees, towering out of a cloud inversion, sidelit by low angle sunlight, with a high horizon and desaturated cold tones, no railings, no snow, no ice, no icecap, no glaciers

Only 3 images again. And only one of them has anything resembling limestone columns. Meanwhile, the bot has forgotten about the rice terraces. Looks like I’m approaching the limits of the bot’s bandwidth just like with the yin-yang piece

11 ) Midjourney prompt (inpainted onto #9): Transantarctic_mountains casting shadows on a cloud_inversion of dense_fog, sidelit by low_angle_sunlight, wide_angle aerial_photograph by Sony A7R II with high_horizon and desaturated_cold_tones — s 750 — style raw — no snow, ice, glacier, icecap — ar 16:9

12) Dall-E 3 prompt: an aerial photograph shot by a Sony A7R II with a wide angle lens, of cavates like Uplistsikhe or the Anasazi cliff dwellings, with limestone columns, in the face of a cliff in the Transantarctic Mountains covered by rice terraces and wild subtropical cloudforest with tree ferns and palm trees, casting shadows on a cloud inversion, sidelit by low angle sunlight, with a high horizon and desaturated cold tones, no railings, no snow, no ice, no icecap, no glaciers

The rice terraces are back, though at the expense of the limestone columns. Only the bottom right is giving me clearly defined shadows on the fog, but one nice picture is enough.

7th Test: Psychedelic Landscape Resin Painting

1 ) Midjourney prompt: colourful textured resin painting of a japanese-inspired psychedelic surrealist landscape in the enigmatic tropics, juxtaposing light and shadow with opaque and translucent layered surfaces — s 750 — style raw

2 ) Dall-E 3 prompt: colourful textured resin painting of a japanese-inspired psychedelic surrealist landscape in the enigmatic tropics, juxtaposing light and shadow with opaque and translucent layered surfaces

Dall-E 3’s take is much more psychedelic.

8th Test: Limestone Bas-Relief

1 ) Midjourney prompt: bas_relief carved into reflective gleaming_limestone, of a palm tree, dimly_lit low_angle_lighting long_shadows

Dall-E 3 Prompt: bas relief carved into reflective gleaming limestone, of a palm tree, dimly lit with low angle lighting, long shadows

Dall-E 3’s giving me the gleaming limestone in a dimly lit area better than Midjourney, which seems to be shining a spotlight on it.

9th Test: Portrait Photograph

Midjourney Prompt: portrait_photo of a Thai Buddhist monk with an alms_bowl, on a pebble_beach in dense_fog — s 50 — style raw

2 ) Dall-E 3 prompt: portrait photo of a Thai Buddhist monk with an alms bowl, on a pebble beach in dense fog

Two different aesthetics. I like Dall-E 3’s more, though I’d prefer the monks to be standing. But I’m sure I could achieve that by changing the prompt to read “standing on a pebble beach”.

10th Test: Charcoal Sketch

Midjourney prompt: chaotic underground landscape, rough textures of walls and tunnels, strong lines leading to a gathering of rebels, bold forms of makeshift shelters, charcoal sketch, high contrast — no order — s 50 — style raw (I generated this prompt in GPT, using the prompt described in this post).

2 ) Dall-E 3 prompt: chaotic underground landscape, rough textures of walls and tunnels, strong lines leading to a gathering of rebels, bold forms of makeshift shelters, charcoal sketch, high contrast, no order

Midjourney did a bit of a sloppy job compared to Dall-E 3.

11th Test: Brazilian-Japanese Fusion Carnival Anime

I wanted to test Dall-E 3’s vocabulary. Would it know what an omikoshi was? They’re the Japanese portable shrines that groups of people carry at festivals.

1 ) Midjourney prompt: Brazilian_carnival with samba_dancers carrying an omikoshi — version niji — style original — s 750

2 ) Dall-E 3 prompt: Brazilian carnival with samba dancers carrying an omikoshi, anime

Two observations:

Only Dall-E 3 gives anything approaching an omikoshi
It’s a bit weird that here Dall-E 3 gave me something super sexy without being asked, when before it was being really strict about blocking prompts, including for things that couldn’t have been violating its content policy

12th Test Fanfic Oil Painting

1 ) Midjourney prompt: oil_painting of a Space_Marine atop a ruined_building, horizon filled with Tyranid_bioforms, heroic_stance, crowded_compositions, Warhammer_40K — s 50 — style raw

2 ) Dall-E 3 prompt: oil painting of a Warhammer 40K Space Marine in a heroic stance atop a ruined building, horizon filled with Tyranid bioforms, crowded composition

Both of them seem to know what a space marine is, but only Dall-E 3 is giving me Tyranid bioforms.

Wrap-Up

Then I wanted to go back in my Dall-E 3 images and I discovered a disappointing thing. The bot only stores your most recent creations. Every time you generate something, you lose one of your old creations, unless of course you’re careful to save them.

Anyway both Dall-E 3 and Midjourney have clear advantages over the other, although in different areas.

Why you might want to use Dall-E 3:

The quality of the images tends to be nicer
The images more consistently match the prompt
If you want to get more elements into your image, you can do that without resorting to inpainting
Faster generation time
Text generation

Why you might want to use Midjourney

Inpainting
Outpainting
Ability to change aspect ratios
You can prompt with images and not just text
Less censorship

I’m going to stick with Midjourney for now, but if Dall-E 3 adds inpainting then I’ll definitely reconsider. That is, of course, unless Midjourney up their game with the new versions they’ve got in the works (which is likely considering the current pace of progress in generative AI). Anyway I hope this article helped you make a decision about what to use, and having fun creating with it!

Putting Dall-E 3 through its paces against Midjourney (over a dozen tests)

Written by Alex Tully