Simple Review: High-Resolution Image Synthesis with Latent Diffusion Models

Jeongwon

4 min readApr 7, 2023

Most images in this post are from the High-Resolution Image Synthesis with Latent Diffusion Models paper.

There are four main steps (texts from the [2]):

The first step is to extract a more compact representation of the image using the encoder E located in the upper left corner of the figure above. Unlike other methods, latent diffusion works in the latent space defined by the encoder rather than in pixel space.
Next, Gaussian noise is added to the image in the upper middle part of the figure as part of the diffusion process that goes from z to zT (in case T steps of noise addition are applied).
The zT representation is then passed through a U-Net located in the middle part at the bottom of the figure. The U-Net has the role of predicting zT-1, and this process is repeated T-1 times until we arrive at z, which is then returned from latent space to pixel space via the decoder D.
Finally, the approach allows for arbitrary conditioning by mapping various input modalities such as semantic maps or text. This is achieved by first transforming the input y with a dedicated encoder τθ and then mapping it to the intermediate layers of the U-Net with the same cross-attention layer used by the Transformer architecture.

========================================================

To compress the image as the latent vector z, they used VQGAN.

Here is how to use latent vector (code book) in VQGAN:

========================================================

Diffusion model: denoise (or noise) model from x(or x_t) to x_t(or x)

Instead of using all image information, they used latent information. And they used UNet as their diffusion model.

Inside their denoising process, they added conditioning information using cross-attention layers. This conditional information y is encoded by \lambda_\theta. The y can contain semantic maps, text, representations, and images. Each step t, encoded information is adjusted as shown in Equation (3).

========================================================

Diffusion Model [5]:

========================================================

This post is to summarize this paper for me to understand.

If there is something wrong with this post, please let me know and explain that.

References:

[1]

[2]

Paper Explained — High-Resolution Image Synthesis with Latent Diffusion Models

While OpenAI has dominated the field of natural language processing with their generative text models, their image…

towardsdatascience.com

[3]

GitHub — CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach, Andreas Blattmann, Dominik Lorenz…

github.com

[4]

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on…

arxiv.org

[5]

Simple Review: High-Resolution Image Synthesis with Latent Diffusion Models

========================================================

========================================================

========================================================

========================================================

Paper Explained — High-Resolution Image Synthesis with Latent Diffusion Models

While OpenAI has dominated the field of natural language processing with their generative text models, their image…

GitHub — CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach, Andreas Blattmann, Dominik Lorenz…

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jeongwon

No responses yet

Simple Review: High-Resolution Image Synthesis with Latent Diffusion Models

========================================================

========================================================

========================================================

========================================================

Paper Explained — High-Resolution Image Synthesis with Latent Diffusion Models

While OpenAI has dominated the field of natural language processing with their generative text models, their image…

GitHub — CompVis/latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models

High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz…

Taming Transformers for High-Resolution Image Synthesis

Designed to learn long-range interactions on sequential data, transformers continue to show state-of-the-art results on…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Jeongwon

No responses yet

High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach, Andreas Blattmann, Dominik Lorenz…