Computer Vision Paper ~ Denoising Diffusion Probabilistic Models

Christian Lin
6 min readApr 7, 2023

--

Generative models are an essential area of research in artificial intelligence that focuses on producing synthetic data that mimics the characteristics of real-world data. The recent advancements in generative models have made them one of the most powerful tools in machine learning. One of the most popular and successful approaches to generative modeling is the use of diffusion models. These models have gained widespread attention due to their ability to synthesize high-quality data, even in complex scenarios. The impressive performance of diffusion models can be attributed to their ability to model complex distributions using a simple diffusion process. The most recent and state-of-the-art diffusion models, such as the Denoising Diffusion Probabilistic Model, have made significant progress in extending the capabilities of diffusion models. These models have been widely used in a variety of applications, including image generation, speech synthesis, and natural language processing. The Denoising Diffusion Probabilistic Model, in particular, is a powerful technique for image denoising and has demonstrated excellent performance in various real-world scenarios. Its success can be attributed to its ability to model the underlying noise distribution and the image data distribution simultaneously. Overall, diffusion models, especially the Denoising Diffusion Probabilistic Model, represent a significant breakthrough in generative modeling and hold tremendous potential for a wide range of applications. Below show some exceptional generative results from my own experiences and others.

prompt: a city built out of shipping containers. frog perspective. night. digital art, retrofuturism and cyberpunk:: — ar 2:1 — v 5 — s 750
prompt: A vibrant and lively cityscape with towering skyscrapers, bustling streets, and colorful lights
prompt: Create a 1920x1080 slide with a white background. Position a centered image of a laptop displaying a graph, surrounded by a plant and a coffee mug, in the middle of the slide. The image should be scaled to fit the slide.

Background

A diffusion model, short for diffusion probabilistic model, is a powerful machine learning technique that involves training a parameterized Markov chain using variational inference. The goal of the model is to generate synthetic data that matches the distribution of the real-world data after a finite amount of time. The model achieves this by learning transitions that can reverse a diffusion process, which is a type of Markov chain that gradually adds noise to the data in the opposite direction of sampling until the signal is destroyed. In simple terms, a diffusion model works by gradually removing noise from the data until it resembles the real-world data.

To make this process effective, the transitions of the sampling chain must be carefully selected. When the diffusion process involves small amounts of Gaussian noise, it is sufficient to set the transitions of the sampling chain to conditional Gaussians. This allows for a simple neural network parameterization that can effectively model the data distribution. By carefully selecting these transitions and using variational inference, diffusion models can generate high-quality synthetic data that closely matches the real-world data distribution.

Overall, diffusion models are a promising technique for generative modeling, with potential applications in a wide range of fields, including image and speech synthesis, natural language processing, and more. Their ability to model complex distributions using a simple diffusion process makes them an exciting area of research for the future of machine learning. The overall diagram is shown as follow.

The directed graphical model considered in this paper

Methodology

Reverse Process

Diffusion models are latent variable models of the form as follow, where x_1, …. , x_T are latent variables. The joint distribution shown in the following equation is called “Reverse Process”, and it is defined as a Markov chain with learned Gaussian transitions:

Forward Process (Diffusion Process)

The opposite direction of reverse process can be considered as the approximate posterior shown as follow, is fixed to a Markov chain that gradually adds Gaussian noise to the data according to a variance schedule β_1 , . . . , β_T :

Training Objective

Training is performed by optimizing the usual variational bound on negative log likelihood:

β can be learned by a reparameterization tricks or keeping constant. A notable property of the forward process is that it admits sampling x_t at an arbitrary timestep t in closed form:

By optimizing the random terms of L using stochastic gradient descent, efficient training of the model becomes possible. Moreover, the model’s performance can be further enhanced by implementing variance reduction techniques, which involve rewriting L.

Above equation uses KL-divergence to directly compare pθ with forward posterior, which is tractable during training with the conditions:

Consequently, all KL divergences are comparisons between Gaussians, so they can be calculated in a Rao-Blackwellized fashion with closed form expressions instead of high variance Monte Carlo estimates.

Forward Process and L_T

In this paper, they ignore the fact that the forward process variances βt are learnable by reparameterization and instead fix them to constants.

Reverse Process and L_{1:T-1}

Let’s recall the choice of reverse process p_θ by a Gaussian distribution. In order to represent the mean, the authors propose a specific parameterization motivated by the following analysis of L_t. With the following alternation:

We can write:

So, we see that the most straightforward parameterization of μ_θ is a model that predicts μ_t, which is the forward posterior mean.

In this paper, instead of predicting the mean of the training samples, they choose to predict the adding noises with similar motivation mentioned above. If you want to know more about the the details about it, please take some patient and time to reference the original paper. The final objective utilized in this paper is shown as follow:

Conclusion

In this paper, I only mentioned some kernel idea of the paper, including forward process and reverse process. However, the mechanism behind the Diffusion model is quite complicated. I hope this article can help you easily catch the big picture of Diffusion model.

In this article, I briefly share my viewpoints on the paper. I hope you can learn more about it after reading it. I also offer the video link about the paper, hope you guys like it!!!!

If you like the article, please give me some 👏 , share the article, and follow me to learn more about the world of multi-agent reinforcement learning. You can also contact me on LinkedIn, Instagram, Facebook and Github.

--

--

Christian Lin

A master CS student used to work at ShangShing as an iOS full-end developer. Now, I dive into AI field, especially Multi-agent RL and Bio-inspired intelligence.