Computer Vision Paper ~ Denoising Diffusion Implicit Models

Christian Lin
5 min readApr 8, 2023

--

Denoising diffusion probabilistic models (DDPMs) can generate high-quality images without adversarial training, but it takes many steps to simulate a Markov chain to produce a sample. To speed up this process, they introduce denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training process as DDPMs. DDPMs define the generative process as the reverse of a specific Markovian diffusion process. However, they generalize DDPMs with non-Markovian diffusion processes that have the same training goal. These processes can correspond to deterministic generative processes that produce high-quality samples much faster. They prove that DDIMs produce high-quality samples 10–50 times faster than DDPMs, enabling the trade-off between computation and sample quality. Additionally, they can interpolate images directly in the latent space and reconstruct observations with low error.

If you want to know more about the DDPM, you can reference the article that I wrote in the following link.

Methodology

Graphical models for diffusion(left) and Markovian inference models(right)

In order to reduce the the number of iterations required by the generative model, the key observation is that the DDPM objective only conditions on the marginal q(x_t | x_0), instead of the joint q(x_{1:T} | x_0). Therefore, the authors consider using non-Markovian inference process just like the diagram shown above. However, they still lead to the same objective function applied in DDPM.

Non-Markovian Forward Process

First, we consider a inference distributions family Q, indexed by a real vector σ:

The forward process can be calculated according the Baye’s rule:

Unlike the diffusion process described in DDRM paper, the transformed forward process is no longer Markovian because x_t not only could depend on x_{t-1} but also x_0.

Generative Process and Unified Variational inference Objective

The paper proposes a trainable generative process pθ(x_{0:T} ) where each p^(t)_θ (x_{t−1} | x_t) leverages knowledge of q_σ(x_{t−1} | x_t, x_0). Given a noisy observation x_t, we first predict x_0, and use x_0 to obtain x_{t-1} through the reverse conditional distribution defined in last section.

The way we get the donoised observation is a prediction of x_0 given x_t:

And we optimized θ based on the following variational inference objective

Sampling from Generalized Generative

Graphical model for accelerated generation, where τ = [1, 3]

By using L_1 as the objective, we are learning both a generative process for the Markovian inference process in DDPM and generative processes for multiple non-Markovian forward processes described by σ. As a result, we can use pretrained DDPM models as the solutions to the new objectives and concentrate on discovering a generative process that can generate better samples according to our requirements by altering σ.

Denosing Diffusion Implicit Models

We can generate a sample x_{t-1} from a sample x_t according to:

where ε_t is a standard Gaussian noise independent of x_t. The magnitude of σ results in different generative processes. However, the authors use the same model ε_θ, thus, we do not need to re-train the model. When the condition shown below is satisfied, the the forward process becomes Markovian, and the generative process becomes a DDPM.

The another extreme cases is σ_t = 0 for all t, the forward process becomes deterministic given x_{t-1} and x_0, except for t = 1. The model obtained is an implicit probabilistic model that generates samples from fixed latent variables in a specific way (from x_T to x_0). It is named the Denoising Diffusion Implicit Model(DDIM) because it is an implicit probabilistic model trained with the DDPM objective, even though the forward process is no longer a diffusion.

Accelerated Generation Processes

We know that forward process requires T steps to finished, so does the reverse or generative process. However, according previous transformation, we can consider forward process with lengths smaller than T, which can also reduce the generative procedure.

This consider that we can separate the forward process in to multiple subsets of increasing sub-sequence of [1, …., T]. The authors call the process of sampling latent variables using reversed(τ) the “sampling trajectory”. If the length of this trajectory is much shorter than T, it can greatly improve computational efficiency since the sampling process is iterative.

In this article, I briefly share my viewpoints on the paper. I hope you can learn more about it after reading it. I also offer the video link about the paper, hope you guys like it!!!!

If you like the article, please give me some 👏 , share the article, and follow me to learn more about the world of multi-agent reinforcement learning. You can also contact me on LinkedIn, Instagram, Facebook and Github.

--

--

Christian Lin

A master CS student used to work at ShangShing as an iOS full-end developer. Now, I dive into AI field, especially Multi-agent RL and Bio-inspired intelligence.