Imagen: The Answer To Dalle-2 by Google

Published in

Artificialis

3 min readMay 27, 2022

Recently, DALLE-2 was released, showing off a great improvement from last year’s DALLE. In fact, DALLE-2 shows improvements in generating photorealistic images from text, with 4-times the resolution of its predecessor.

As you might see, and that’s what Google Brain’s team attacked with, DALLE-2 lacks of realism, a problem which was solved by Imagen.

An overview

Before diving into the diffusion model, which is the core of these kind of algorithms, let’s understand how the input is treated.

Google Brain used a huge text model, similar to GPT-3, to understand and extract information from the text. So, instead of training a text model along with the image generation model, like its predecessor, they used a big pre-trained model and froze it so that it didn’t change during the training of the image generation model.
From their study, this led to much better results, and it seemed like the model “understood” text in a better way.

Once text encodings are obtained, the diffusion model comes into play.

Imagen: The Answer To Dalle-2 by Google

An overview

Written by Alessandro Lamberti