Aug 30, 2022

July 2022 has brought us public betas of two impressive AI services for generating original images from text, OpenAI’s DALL·E 2 and Midjourney.

Both AIs offer similar “tell me what to paint” (text to image) service. Both use diffusion algorithms to reduce initial noice to an image matching your prompt, both utilise huge models trained on billions of parameters.

Let’s explore their features, strengths and weaknesses.
Note: when sharing AI generated image, always tell your prompt.

Prompt: beach with people, asteroid hitting earth, 4k, realistic (Midjourney V3)

Introduction to DALL·E 2

Second version of OpenAI’s DALL·E, initially introduced in January 2021, is perhaps the most talked-about image generating AI. It’s based on their own GPT-3 model, providing brilliant level of understanding of natural language and it’s meanings and concepts. This model can generate various styles of images, ranging from pencil drawings, oil paintings to photorealistic images.

Big feature of DALL·E is the ability to make realistic, area-defined edits to existing pictures, matching original image style, reflections, textures etc.

Prompt: Add a Great Dane dog sitting (original image via Decoarted, edits by DALL·E)

DALL·E will produce 4 images with 1 megapixel resolution, with further options of generating variations or further editing.

Introduction to Midjourney

Midjourney’s generative algorithms are developed by an independent research lab of the same name, running their service as a Discord bot.

In comparison, Midjourney’s language processing lags behind DALL·E, taking on keywords rather than comprehensive meanings of captions. However, Midjourney seems to have a little more freedom of creativity along the way. Result images are way more artistic, altho less accurate to your prompt.

Midjourney too supports a huge variety of styles, altho it’s not as versatile as DALL·E. For better or worse, Midjourney has its own artistic signature.

Diffusion algorithm in action. Prompt: great dane dog sitting in the bathroom

Big advantage of Midjourney over DALL·E is additional control via prompt parameters. You can choose creative algorithm version, incl. testing previews, aspect ratio, level of chaos, result quality (proportional to cost) etc. You can also generate more images from the same seed for creative continuity, instead of generating new images from scratch.

Midjourney will produce 1-4 low-res images, depending on algorithm and parameters used. You can generate further variations or upscale a single image to up to 3 megapixels.

Fight! 🥊

Note that both AIs are work-in-progress, their features and result qualities may change on almost daily basis.

Round 1. Flying Cat with Dragon wings

This round turned out to be a good example of DALL·E’s better understanding of concepts, while Midjourney produces better looking pictures.

Prompt: photo of a flying black cat with dragon wings, dramatic, dark skies, cinematic, 4k

Round 2. Architecture

Here we see the difference in the approach of the two AIs. Midjourney went for creative concept art, and it wasn’t afraid to mix the house into the background mountains. For the same prompt, DALL·E gave us some very realistic real estate pictures.

Prompt: detailed, futuristic, house above a lake, mountains in the background

Round 3. Glass Sculpture of a Duck

Just look at the details of Midjourney’s result. I’m very amazed, as this would’ve been a very hard task for a conventional 3D render.

Prompt: transparent glass sculpture of a duck

Round 4. Van Gogh Selfie

Vincent van Gogh can rightly be considered the inventor of the selfie, painting perhaps more self portraits than any other subjects combined.

Midjourney captures this pretty well, Van Gogh is well recognisable as a person, and the imitation of his style is rather close as well. DALL·E sticks more to the prompt, but Vincent looks more like Socrates, and the style is rather of a Van-Gogh-influenced kid.

Prompt: Van Gogh painting his selfie with a brush in Van Gogh style

Round 5. Realistic Faces

This is always a tricky thing for a generative AI. Both AIs stick to the prompt very well. Notice how DALL·E produced more real-life face, even from an angle, but messed up fingers and the bottle stopper.

Prompt: (photo of) native american woman holding green potion, good looking, rithual, jungle

Bonus Round: picture of yourself

I’m always curious of how AIs see themselves. Midjourney sees itself as a young white brunette. While DALL·E thinks itself to be ¾ asian girl and ¼ cat.

Clearly, neither AI understood this prompt. The biggest actual difference is in the interpretation of the word “portrait.” I’m a bit surprised of DALL·E’s result, GPT-3 models are pretty good at imitating consciousness, I would’ve expected to see something like a robot or so.

Prompt: portrait of Midjourney / DALL·E (respectively)


Both AIs are plainly amazing. Just a few years ago, it was hard to think of computer algorithms producing original, let alone creative content of this quality.

It’s clear, especially from the Bonus Round, that Midjourney creates better art, while DALL·E is being useful as a stock-image generator.

In my human opinion, DALL·E has better potential for commerciable images, but Midjourney is far more interesting for its creativity.


✅ artistic signature ✅ more controls ✅ creates art ❌ less versatile
Final judgement: 🧑‍🎨 better artist


✅ better NLP ✅ image editing feature ❌ less artsy ✅ more versatile
Final judgement: 📸 better soldier

