Google’s Imagen vs OpenAI’s DALLE-2

Vishal Rajput
Published in
5 min readMay 31, 2022


Text to image generation models is the new fad in the market of AI. The latest model in this race is Google’s Imagen model. An answer to OPENAI’s DALLE-2 model. DALLE-2 made some great progress over DALLE-1 but in reality, it was not a successor of DALLE-1 but of GLIDE (another paper from OpenAI). DALLE-1 was trained with the help of GPT-3 whereas DALLE-2 used the technique of the diffusion model. Both DALLE-2 and Google’s Imagen use the Diffusion model with slight differences. Both models are extremely good at generating images from text prompts. But comparing their actual performance is a little tough thing to do as none of these are open-sourced yet.

Images generated by Imagen (Image Src-

So, without further ado let’s get into the nitty-gritty of the Imagen and see how it is different from OpenAI’s DALLE-2.

Let’s first see what are the Diffusion models.

Diffusion Model

A Diffusion model is basically a generative model that takes an input image and gradually increases the noise added to the input image at each given timestamp. The model adds noise till it starts looking…

