Diffusion models
Diffusion probabilistic models are parameterized Markov chains models trained to gradually denoise data.
Diffusion probabilistic models are latent variable models capable to synthesize high quality images. Their performance is, allegedly, superior to recent state-of-the-art generative models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) in most cases. At first, we refer to a seminal paper by Ho, Jain and Abbeel (2020) and then examine some quick coding example; finally we’ll mention some recent developments.
Denoising Diffusion Probabilistic Models
In a 2015 paper, Sohl-Dickstein et al. introduced diffusion probabilistic models (also called diffusion models for brevity). Diffusion models sample from a distribution by reversing a gradual noising process: sampling starts with noise
and then less noisy samples
are produced until reaching a final sample x₀. Each timestep t corresponds to a certain noise level and xₜ can be thought of as a mixture of a signal x₀ and some noise 𝜀.
Therefore, a diffusion model is essentially a Markov chain model trained to produce samples matching the original data after finite time. Each transition in the chain learns to reverse a diffusion process, which is a Markov chain that gradually adds noise to the data in the opposite direction of sampling until signal is destroyed (check the figure below).
A diffusion model learns to produce a slightly more denoised xₜ₋₁ from xₜ . In practice, the model is a function 𝜀(xₜ, t) which predicts the noisy component of xₜ. Training these models involve the random drawing of a data sample x₀, a timestep t and noise 𝜀, which give rise to a noised sample xₜ. Then, the training objective is
that is, the simple mean-squared error loss between the true noise and the predicted noise.
Mathematical description
A diffusion model is defined by a (noising) forward process that gradually destroys data x₀ ∼ q(x₀) over the course of T timesteps by adding Gaussian noise at time t with variance βₜ ∈ (0,1)
and a (denoising) parametrized reverse process p such that
Here x₁, x₂,… are the latent variables of the same dimensionality as x₀ and 𝒩 denotes the multivariate Gaussian distribution. The reverse step p(xₜ₋₁| xₜ) is a neural network approximation of q(xₜ₋₁| xₜ).
We do not need to apply q repeatedly to sample from xₜ ∼ q(xₜ | x0). In fact, q(xₜ | x₀) can be expressed as a Gaussian distribution
where
so one can write
(the proof is quite straightforward, you can find it in Appendix A of this paper). The processes p and q form a variational autoencoder, so initially we can try to train optimizing the usual variational lower bound on negative log likelihood
This loss can be expressed using KL-divergence (denoted with 𝓓) as
where
Using Bayes theorem, one can calculate q(xₜ₋₁ | xₜ, x₀) in terms of the following parameters
giving
However, to achieve higher image quality, it is preferable to optimize a simplified loss function instead of optimizing the variational lower bound. This simpler loss function is the following:
where 𝜀 ~𝒩(0, I) and t ~ 𝓤({1,2,…,T}). Note that simplified L provides no learning signal for variance term Σ(xₜ , t). Ho et al. (2020) achieved their best results by fixing the variance to σₜ²I. However, further experiments show that learning σₜ could be beneficial.
Coding example
The following PyTorch example employs code from Phil Wang “lucidrains” repository. You can install the package using pip
.
pip install denoising_diffusion_pytorch
After that, create a folder named images
in the folder containing the denoising_diffusion_pytorch
folder and put some 32×32 images inside of it (you can test even with just one single image). Then execute the following code.
%matplotlib inlineimport torch
from denoising_diffusion_pytorch import(Unet,
GaussianDiffusion,
Trainer)model = Unet(
dim = 64,
dim_mults = (1, 2, 4, 8)
)diffusion = GaussianDiffusion(
model,
image_size = 128,
timesteps = 1000, # number of steps
loss_type = 'l1' # L1 or L2
)trainer = Trainer(
diffusion,
'./images',
train_batch_size = 4,
train_lr = 2e-5,
# total training steps
train_num_steps = 700000,
# gradient accumulation steps
gradient_accumulate_every = 2,
# exponential moving average decay
ema_decay = 0.995,
# turn on mixed precision training with apex
fp16 = False
)trainer.train()sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)
Try to lower timesteps
and train_num_steps
to speed up the computation (however, lowering image quality). You can also visualize some result.
import matplotlib.pyplot as plt
import numpy as npdef show(img):
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)),
interpolation='nearest')show(sampled_images[0])
Further developments
Several ways to improve performance were suggested by Dhariwal and Nichols (2021). Their paper Diffusion Models Beat GANs on Image Synthesis shows how to achieve image sample quality superior to the current state-of-the-art generative models. They found a better architecture through a series of ablations and improved image quality by classifier guidance. In practice, a classifier
is trained on noisy images xₜ and the purpose of gradients
is to guide the diffusion sampling process towards an arbitrary class label y.
Further results about diffusion models appear in the article Cascaded Diffusion Models for High Fidelity Image Generation by Ho et al. (2021). Cascaded Diffusion Models (CDM) are pipelines of diffusion models that generate high fidelity images of increasing resolution. Samples are state-of-the-art quality and these results are achieved using pure generative models without any classifier. Moreover, authors introduce conditioning augmentation, a data augmentation technique that they found critical towards achieving high sample fidelity.
Originally posted on m0nads.
Support this blog.
Useful links
Denoising Diffusion Probabilistic Models
J. Ho, A. Jain, P. Abbeel
arXiv:2006.11239 [cs.LG], 2020
Improved Denoising Diffusion Probabilistic Models
A. Nichol, P. Dhariwal
arXiv:2102.09672 [cs.LG], 2021
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan, S. Ganguli
arXiv:1503.03585 [cs.LG], 2015
Diffusion Models Beat GANs on Image Synthesis
P. Dhariwal, A. Nichol
arXiv:2105.05233 [cs.LG], 2021
Cascaded Diffusion Models for High Fidelity Image Generation
J. Ho, C. Saharia, W. Chan, D. J. Fleet, M. Norouzi, T. Salimans
arXiv:2106.15282 [cs.CV], 2021
On Fast Sampling of Diffusion Probabilistic Models
Z. Kong, W. Ping
arXiv:2106.00132 [cs.LG], 2021
PyTorch implementation (repo).