Imagen: The Answer To Dalle-2 by Google

Alessandro Lamberti
Published in
3 min readMay 27, 2022


Image from Imagen project website

Recently, DALLE-2 was released, showing off a great improvement from last year’s DALLE. In fact, DALLE-2 shows improvements in generating photorealistic images from text, with 4-times the resolution of its predecessor.

As you might see, and that’s what Google Brain’s team attacked with, DALLE-2 lacks of realism, a problem which was solved by Imagen.

An overview

Before diving into the diffusion model, which is the core of these kind of algorithms, let’s understand how the input is treated.

Google Brain used a huge text model, similar to GPT-3, to understand and extract information from the text. So, instead of training a text model along with the image generation model, like its predecessor, they used a big pre-trained model and froze it so that it didn’t change during the training of the image generation model.
From their study, this led to much better results, and it seemed like the model “understood” text in a better way.

Image from the paper

Once text encodings are obtained, the diffusion model comes into play.

