Google Imagen vs OpenAI DALL·E 2

Published in

Augmented AI

5 min readMay 30, 2022

Oh My God! It is NOT a great time for OpenAI right now. It’s been just over a month since DALL·E 2 was released and just a few days ago, Google decides to enter the ring with Imagen. In comparison, Imagen is a slap in the face for DALLE·2 mainly because it outperforms DALLE·2 in terms of AI Image generation precision and quality.

OpenAI DALL·E 2 vs Goole’s ImageGen. Text to Image Generator — Some results of Imagen — How amazing are these images!

If by now you are wondering WTF is ImageGen and DALLE·2, how does this technology work in simple lingo, as well as what makes Google's Imagen so superior then take a walk with me through this article as we shall discover together my amigo.

Image Generation

So both technologies in simple terms, allow you to generate images from text. For this to happen, the AI requires a deep level of understanding of language…similar to Ultron…just kidding :P. OpenAI was the first to lead this research with their initial iteration being called DALL·E, a term coined after combining the artist Salvador Dali and the robot WALL·E from the Pixar movie.

DALL·E 2

OpenAI then introduced DALL·E’s successor, DALL·E 2 which is a more versatile and efficient generative system capable of producing higher resolution images. Compared to DALL·E’s 12-billion parameters, DALL·E 2 works on a 3.5-billion parameter model and another 1.5-billion parameter model to enhance the resolution of its images. That's a Kak-Lot of parameters!

To sum it up, DALL·E could generate images from text, but DALL·E 2 could do it better…Waaay better. If are something of a scientist yourself…or not…and want a deeper understanding of DALL·E 2’s architecture then check out one of the other articles that we wrote on Augmented Startups HERE.

DALL·E 2 architecture — DALL·E 2 Architecture [Source — Augmented Startups]

The beauty of DALL·E 2 was that it could create original, realistic images and art from text description..like I said so many times before. But just check out some of the sample images. It's just unbelievable to think that an AI could generate such works of art and real multiverse-type images.

Enter Imagen

So Imagen is based on the Transformer T5 Model.

Transformer for OpenAI DALL·E 2 vs Goole’s ImageGen. Text to Image Generator

No not that type of Transformer.

This transformer model originally produced 64 x 64 images which are super small and not very useful. But the magic comes in by using a super-resolution model to scale up and enhance these images so that they are pleasant to look at. Just imagine a comparison in TV models as they evolved from Standard Definition (SD) to 4K. This enhancement technique allows the image generation process to occur in seconds instead of hours.

Goole’s ImageGen — Text to Image Generator

DrawBench

To assess where Imagen stands in comparison to other text-to-image models, they introduced a benchmark called DrawBench. Human critics analyzed the results between VQ-GAN+CLIP, Latent Diffusion Models, Google’s Imagen as well as DALL·E 2.

As you can see from this pretty diagram, Imagen leaves all the other models in the dust! This is in terms of the sample quality and image-text alignment.

Tech Misuse

Now if you are getting fidgety and ready to dive into playing around with Imagen, then I have some bad news, my friend. Both DALL·E-2 and Imagen are not openly available to us commoners due to concerns about misuse. I mean think about it, if this technology fell into the wrong hands, only God knows what chaos can break loose! I mean just imagine if someone generated an Image of me eating a pizza with pineapples on it…my family would disown me!

Misuse of AI — Please do not generate an image of me eating a Pineapple Pizza!

So for now it looks like we’ll just have to wait for this technology to get pirated like mp3s back in the day or for an alternative open source solution to arrive. Both of which won’t be anytime soon.

Jeff Dean, a senior AI researcher at Google AI, states that he “sees AI as having the potential to foster creativity in human-computer collaboration” That’s great and all but it is of no use if we cannot get our hands on it.

I’m not sure how they would safeguard the world from the misuse of this technology so that it will not be used for unethical purposes but this is some very powerful technology if it will be applied to the right industries.

Going Forward

So what will you generate if you had access to Imagen or DALL·E 2? Write your ideas in the comments down below :). I know I would generate an image of a person liking and following my posts and showing a million followers. Wuahahahaha ….cough.. anyways.

If you would like to learn about AI and Computer Vision, check out my Courses and Projects on AugmentedStartups.com