Diffusion models

Diffusion probabilistic models are parameterized Markov chains models trained to gradually denoise data.

6 min readJan 25, 2022

Diffusion probabilistic models are latent variable models capable to synthesize high quality images. Their performance is, allegedly, superior to recent state-of-the-art generative models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) in most cases. At first, we refer to a seminal paper by Ho, Jain and Abbeel (2020) and then examine some quick coding example; finally we’ll mention some recent developments.

Denoising Diffusion Probabilistic Models

In a 2015 paper, Sohl-Dickstein et al. introduced diffusion probabilistic models (also called diffusion models for brevity). Diffusion models sample from a distribution by reversing a gradual noising process: sampling starts with noise

and then less noisy samples

are produced until reaching a final sample x₀. Each timestep t corresponds to a certain noise level and xₜ can be thought of as a mixture of a signal x₀ and some noise 𝜀.

Therefore, a diffusion model is essentially a Markov chain model trained to produce samples matching the original data after finite time. Each transition in the chain learns to reverse a diffusion process, which is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed (check the figure below).

A diffusion model learns to produce a slightly more denoised xₜ₋₁ from xₜ . In practice, the model is a function 𝜀(xₜ, t) which predicts the noisy component of xₜ. Training these models involve the random drawing of a data sample x₀, a timestep t and noise 𝜀, which give rise to a noised sample xₜ. Then, the training objective is

that is, the simple mean-squared error loss between the true noise and the predicted noise.

Mathematical description

A diffusion model is defined by a (noising) forward process that gradually destroys data x₀ ∼ q(x₀) over the course of T timesteps by adding Gaussian noise at time t with variance βₜ ∈ (0,1)

and a (denoising) parametrized reverse process p such that

Here x₁, x₂,… are the latent variables of the same dimensionality as x₀ and 𝒩 denotes the multivariate Gaussian distribution. The reverse step p(xₜ₋₁| xₜ) is a neural network approximation of q(xₜ₋₁| xₜ).

We do not need to apply q repeatedly to sample from xₜ ∼ q(xₜ | x0). In fact, q(xₜ | x₀) can be expressed as a Gaussian distribution

where

so one can write

(the proof is quite straightforward, you can find it in Appendix A of this paper). The processes p and q form a variational autoencoder, so initially we can try to train optimizing the usual variational lower bound on negative log likelihood

This loss can be expressed using KL-divergence (denoted with 𝓓) as

where

Using Bayes theorem, one can calculate q(xₜ₋₁ | xₜ, x₀) in terms of the following parameters

giving

However, to achieve higher image quality, it is preferable to optimize a simplified loss function instead of optimizing the variational lower bound. This simpler loss function is the following:

where 𝜀 ~𝒩(0, I) and t ~ 𝓤({1,2,…,T}). Note that simplified L provides no learning signal for variance term Σ(xₜ , t). Ho et al. (2020) achieved their best results by fixing the variance to σₜ²I. However, further experiments show that learning σₜ could be beneficial.

Coding example

The following PyTorch example employs code from Phil Wang “lucidrains” repository. You can install the package using pip.

pip install denoising_diffusion_pytorch

After that, create a folder named images in the folder containing the denoising_diffusion_pytorch folder and put some 32×32 images inside of it (you can test even with just one single image). Then execute the following code.

%matplotlib inlineimport torch
from denoising_diffusion_pytorch import(Unet,
                                        GaussianDiffusion,
                                        Trainer)model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
)diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
)trainer = Trainer(
    diffusion,
    './images',
    train_batch_size = 4,
    train_lr = 2e-5,
    # total training steps
    train_num_steps = 700000, 
    # gradient accumulation steps      
    gradient_accumulate_every = 2,
    # exponential moving average decay
    ema_decay = 0.995,
    # turn on mixed precision training with apex              
    fp16 = False                    
)trainer.train()sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)

Try to lower timesteps and train_num_steps to speed up the computation (however, lowering image quality). You can also visualize some result.

import matplotlib.pyplot as plt
import numpy as npdef show(img):
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)),
                            interpolation='nearest')show(sampled_images[0])

Further developments

Several ways to improve performance were suggested by Dhariwal and Nichols (2021). Their paper Diffusion Models Beat GANs on Image Synthesis shows how to achieve image sample quality superior to the current state-of-the-art generative models. They found a better architecture through a series of ablations and improved image quality by classifier guidance. In practice, a classifier

is trained on noisy images xₜ and the purpose of gradients

is to guide the diffusion sampling process towards an arbitrary class label y.

Further results about diffusion models appear in the article Cascaded Diffusion Models for High Fidelity Image Generation by Ho et al. (2021). Cascaded Diffusion Models (CDM) are pipelines of diffusion models that generate high fidelity images of increasing resolution. Samples are state-of-the-art quality and these results are achieved using pure generative models without any classifier. Moreover, authors introduce conditioning augmentation, a data augmentation technique that they found critical towards achieving high sample fidelity.

Originally posted on m0nads.

Support this blog.

Useful links

Denoising Diffusion Probabilistic Models
J. Ho, A. Jain, P. Abbeel
arXiv:2006.11239 [cs.LG], 2020

Improved Denoising Diffusion Probabilistic Models
A. Nichol, P. Dhariwal
arXiv:2102.09672 [cs.LG], 2021

Deep Unsupervised Learning using Nonequilibrium Thermodynamics
J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, S. Ganguli
arXiv:1503.03585 [cs.LG], 2015

Diffusion Models Beat GANs on Image Synthesis
P. Dhariwal, A. Nichol
arXiv:2105.05233 [cs.LG], 2021

Cascaded Diffusion Models for High Fidelity Image Generation
J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, T. Salimans
arXiv:2106.15282 [cs.CV], 2021

On Fast Sampling of Diffusion Probabilistic Models
Z. Kong, W. Ping
arXiv:2106.00132 [cs.LG], 2021

PyTorch implementation (repo).