Google’s Imagen vs OpenAI’s DALLE-2

Vishal Rajput
AIGuys
Published in
5 min readMay 31, 2022

--

Text to image generation models is the new fad in the market of AI. The latest model in this race is Google’s Imagen model. An answer to OPENAI’s DALLE-2 model. DALLE-2 made some great progress over DALLE-1 but in reality, it was not a successor of DALLE-1 but of GLIDE (another paper from OpenAI). DALLE-1 was trained with the help of GPT-3 whereas DALLE-2 used the technique of the diffusion model. Both DALLE-2 and Google’s Imagen use the Diffusion model with slight differences. Both models are extremely good at generating images from text prompts. But comparing their actual performance is a little tough thing to do as none of these are open-sourced yet.

Images generated by Imagen (Image Src- https://arxiv.org/pdf/2205.11487.pdf)

So, without further ado let’s get into the nitty-gritty of the Imagen and see how it is different from OpenAI’s DALLE-2.

Let’s first see what are the Diffusion models.

Diffusion Model

A Diffusion model is basically a generative model that takes an input image and gradually increases the noise added to the input image at each given timestamp. The model adds noise till it starts looking…

--

--