Comparing Stable Image Core/Ultra, Stable Diffusion 3/3-turbo/XL/1.6 and Dall-E 2/3 Image Generation Models

Noah Youngs
10 min readJun 12, 2024

--

This post is a high-level comparison of pricing and performance for the text-to-image models available through the Stability AI and OpenAI APIs.

At the end of the article, I share a link to the code that I have written to make the comparison images in case you would like to use your own prompts for comparison.

Models & Pricing

Prices shown are for generating standard 1024x1024 images.

View the latest Stability pricing here: https://platform.stability.ai/pricing View the latest OpenAI pricing here: https://openai.com/api/pricing/

Stability

  • Stable Image Ultra ($0.08/generation)
  • Stable Image Core ($0.03/generation)
  • Stable Diffusion 3 Large ($0.065/generation)
  • Stable Diffusion 3 Large Turbo ($0.04/generation)
  • Stable Diffusion XL ($0.002–0.006/generation)
  • Stable Diffusion 1.6 ($0.002–0.010/generation)

OpenAI

  • Dall-E 3 ($0.04/generation)
  • Dall-E 2 ($0.02/generation)

Performance

I tested several prompts of varying themes and natures on every model. You can see the results for each prompt below. For each prompt, I will comment on the results, pick my favorite model, and at the end I will tally my favorites and declare the best performing model in my opinion.

I won’t comment much on Dall-E 2 because to be honest, I only included it to complete the 2x4 grid.

It’s worth noting that I didn't use any prompting techniques or prompt engineering in order to be fair to every model. All of the prompts just simply describe what I want to see, and none of the prompts are very detailed, so a lot is left up to each model.

Prompt 1: Lighthouse

This is probably the lest descriptive prompt of them all. A lot is left up to the image model.

I think Dall-E 3’s creation really stands out as it's dreamy and fantastical, but not really realistic. I think this will be a common theme we will see with Dall-E 3.

I perfer Stable Image Core’s result over Ultra, because it seems much more realistic. Ultra’s result looks overly saturated like a default screensaver that comes on a TV. I think we will continue to see this high saturation with Stable Image Ultra as well.

SD3-Turbo’s image seems way too contrasted especially around the rocks and the water, which I think makes the image look bad. We will also continue to see this high contrast effect with this model, it usually looks like a comic book or illustration style.

Of the remaining Stable Diffusion models, I think SD3 does well, but SDXL is very impressive given that it’s more than 10 times cheaper than SD3.

My personal favorite here is Stable Image Core, with Dall-E 3 being a close second.

Prompt 2: Cozy Cabin

This prompt was a little more descriptive than the first one, but I didn’t specify wether it was an interior or exterior shot which created some variation, especially with ‘with a fireplace’ being in the prompt.

I thought about leaving this prompt out of the comparison at first, but I decided to include it because it shows how each model deals with the fact that it’s difficult to depict a cozy cabin (which implies an exterior shot) and a fireplace (which implies an interior shot)

As a human I would expect an interior shot of the cozy cabin with a fireplace inside the house. Stable Image Core attempted this, and captured the cozy cabin but the fireplace was left cut out.

Stable Image Ultra, SD3 Turbo, and Dall-E 2 decided to place the fireplace on the exterior of the house. I think Stable Image Ultra did aesthetically very well but the scene is not really realistic. Dall-E 2’s result was much more realistic but the cabin didn’t seem very cozy.

It also seems SD3, SDXL, SD1.6 and Dall-E 3 have confused the fireplace for a campfire/fire pit, and SDXL, SD1.6 and arguably Dall-E 3 as well have even depicted the entire cabin to be on fire.

Also we can see some of the same themes from earlier with SD3 Turbo’s image again having too much contrast, like comic book style and looking terrible in my opinion. Dall-E 3’s image is again dreamy and fantastical.

My personal favorite here is Stable Image Ultra but this was a tough call.

Prompt 3: Highway Sign

I included this prompt to compare how well each model does with text.

It seems that only Stable Image Ultra and Dall-E 3 have managed not to mess up text.

Of these, my favorite is Stable Image Ultra’s result because Dall-E 3’s image has strange nails sticking out of the sign and the letters look like they’re embossed or sticking out which make the image less aesthetically pleasing.

Prompt 4: Night Club

I included this prompt to test each models’ ability to depict human faces realistically.

I think Stable Image Ultra did okay, but there is a disfigured face at the right edge of the image, and the girl on the left’s arm looks strange.

Stable Image Core seemed to do the best, but I found it unnatural that two of the girls had the exact same face (I guess they could be twins), and even the third girl also had a quite similar face.

SD3’s result is quite terrible with all of the people looking like they are morphed into each other.

With SD3 Turbo, no matter how many times I redid the image, the image will always come out blurred which I think is a nudity censoring feature that is not working as it should.

I think SDXL did pretty good, except for minor issues with the faces, and the one girl’s fingers looking strange.

The people in SD1.6 and Dall-E 3’s images look weird and like pixar characters, and I’m not going to comment on Dall-E 2.

My favorite here is Stable Image Core.

Prompt 5: Fashion Model

The results of this prompt when compared to those of the previous prompt, show that image models are much better at depicting one person at a time.

I also need to mention that with Stable Image Ultra, Stable Diffusion 3 and 3 Turbo, I was getting the same blurred image issue as I did with 3 Turbo in the previous prompt, but here I was able to get proper images after a few tries.

The results here are quite subjective, but I think that realism is what we're really going for here. It seems like the top row depicts the most realistic models (except for the fact that the girl’s coat in SD3’s image is messed up).

I think it’s also worth mentioning that Dall-E 3 has done best with the background in my opinion.

My favorite is Stable Image Core, with Stable Image Ultra being a close second.

Prompt 6: Futuristic City

Many of the images look quite similar here, except for SD3 Turbo and Dall-E 3’s images standing out.

I think this prompt resulted in the best SD3 Turbo image so far, with it’s usual comic book style working pretty well. I think the flying car in SD3 Turbo’s image is done pretty well.

Dall-E 3’s fantastical and dreamy style also works very well. The flying cars in its image I think look the most like flying cars.

It seems like the other models decided to include UFOs isntead of flying cars, but I like the aestheticism of Stable Image Ultra’s image.

My favorite here is Dall-E 3’s image, with Stable Image Ultra coming in second.

Prompt 7: Floating Cube

I didn’t come up with this prompt myself, it was written by GitHub Copilot while I was writing the code that creates these comparison images. Again, I’ll share this code at the end of the article if you’d like to do comparisons with your own prompts.

I think that all of these images look great here. SD3 Turbo’s image looks clearly like an illustration, but its a good illustration.

Many of the models have attempted to make the cube reflective, with some being more successful than others at making the reflections realistic.

My personal favorite is SD3’s creation but I like SDXL’s creation as well.

Prompt 8: Wolf in Forest

I don’t have too much to add here, just a few things I noticed:

  • In Stable Image Ultra’s image, the wolf’s front paw is messed up.
  • In SD3’s Image, the wolf has 5 legs.
  • SD 3 Turbo’s image again is a high contrast comic book/illustration style which I don’t like
  • Dall-E 3’s image looks dreamy and fantastical again which in my opinion doesn’t fit the scene.

My favorite here is Stable Image Core but I think SDXL is almost just as good.

Prompt 9: Living Room with View

Here I decided to use an Interior scene. Most of the models had trouble with this prompt.

In Stable Image Ultra’s image, there is strange paneling behind the couch, there is an unidentifiable blob under the fake tree, and the chair all the way to the right of the image along with the right edge of the couch is messed up.

Stable Image Core focused too heavily on the window and the view and missed the living room entirely.

SD3’s image has some strange effects around the coffee table and its reflection, parts of the carpet and the floor, parts of the couch, and in the window bezels.

SD3 Turbo’s image didn't have deformities which is great, but the realism is poor with the room looking like a 3d rendering and the buildings outside looking like an illustration.

SDXL and SD1.6 had many deformities in the furniture (as expected), and parts of the view in the background look decrepit or apocalyptic (although my guess is they are depicting the project housing areas in New York City with the city skyline in the back which is somewhat realistic).

The only issue I see with the Dall-E 3 image is with the bezels in the window, and this is my personal favorite even though it's not hyperrealistic because it's aesthetically pleasing and believable.

Prompt 10: Time Traveler

I found the results of this prompt to be the most interesting, because it seems that most of the image models had trouble grasping the “time traveler” part of the prompt.

Stable Image Ultra had in my opinion the most aesthetic image of Ancient Egypt, but there is no sign of a time traveler.

Stable Image Core created what seems to be a photorealistic image of Ancient Egypt, which looks strange in my opinion and there is again no concept of time travel.

Stable Diffusion 3 sort of grasps the idea by depicting a modern looking man with a pharaoh, but the modern man is wearing Ancient Egyptian clothes and the distinction is unclear.

Stable Diffusion 3 Turbo just creates an image of Ancient Egypt in its usual comic book illustration style, and SDXL and SD1.6 depict Ancient Egypt with deformed statues.

My favorite by far is Dalle-3, which is the only model that perfectly grasped the idea of the time traveller. You can clearly see the modern woman wearing time travel gear with her robot assistant amongst the ancient people. I think this image really blows the other attempts out of the water.

Bottom Line

The final winner for performance is Stable Image Core, which produced the most images that were my favorite. This combined with its relatively low price point of 3 cents per generation makes it my top recommendation of all the models mentioned in this comparison.

However, its clear that each model has its strengths and weaknesses, so here are my findings for which situations some of the models are particularly fit for.

  • Stable Image Ultra — Creates highly saturated images that often look great. Also had best performance in the one test involving text
  • Stable Image Core — Most versatile image model and my favorite overall. However it has an issue with leaving parts cut out in room interior images
  • Stable Diffusion 3 — Not great at detailed features like faces and extremities, but good for landscape scenes
  • Stable Diffusion 3 Turbo — Has an illustrative/comic book effect which can be good in certain situation like the futuristic city prompt, but performs poorly in general.
  • SDXL — Perhaps the best value for dollar given its low price point. However, it's not great with details like fingers and furniture much like Stable Diffusion 3.
  • SD1.6 — Similar to SDXL, slightly poorer performance (as expected)
  • Dall-E 3 — Creates very stylistic images with a dreamy, fantasitical style. I think it's great for fictional/fantasy scenes that don’t exist in real life (like the time traveler in Ancient Egypt), but it’s not great for photorealism. However it also seems to work well for interior images.
  • Dall-E 2 — Very poor performance as expected. I can’t think of any reason to still use this over SDXL.

It’s important to note that this experiment was a high level overview, and it completely skips over prompt design/engineering techniques, the style presets supported by Stable Image Core SDXL, and SD1.6, and it fails to account for the variation in the image results across multiple tries using the same prompts.

However, I hope that this article can give you a general idea of how well each model performs in different situations.

Colab Notebook — Try comparing your own prompts

I’d like to share the code I’ve written to make the comparison images in case you wanted to try your own prompts.

You can access it here: https://colab.research.google.com/drive/1hGNfbhNqVM6-nmPVpAyIHq5f7_xQNgfv?usp=sharing

Note that you will need both Stability AI and OpenAI API keys to use the notebook.

--

--

Noah Youngs

Solopreneur, AI Native Applications. AI-assisted media and content creation.