Diffusion Models In a Nutshell — Part 1: A Non-technical Teaser

M. Baddar
BetaFlow
Published in
4 min readMar 26, 2024
Two Images for an Audi Car Generated By Stable-Diffusion : https://stablediffusionweb.com/app/image-generator

Table of Contents

  1. What are Diffusion Models, and how they work.
  2. Application of Diffusion Models

What are Diffusion Models ?

Like the image you have just seen, you can generate one yourself by going to Stable-Diffusion webpage, type few words about any thing in your mind and cool images are generated. Peace of Cake , isn’t it !

Actually, it is not that Easy ! Of course generating the images from text is actually easy , however developing the software to do that magic is not easy, at all. In this article, we are going to shed some light over the backbone technology making this miracle image generation possible : The two keywords are Generative Modeling and Diffusion Models

Generative Models are that type of models that understands the “essence” of the data. The essence here , means in technical terms, the “multivariate joint distribution” the makes the images, text and all other forms of data we see today.

For example, think about images : each image can be considered as a multidimensional random variable ,Y, with dimension d. A sample (realization) from this random variable can be written as : y[1:d], with joint probability p_Y(y). To make things more visual , let’s consider the following set of MNIST images :

Figure 1 — Set of MNIST image realizations for numbers 0 to 9 . Source https://www.researchgate.net/figure/Example-images-from-the-MNIST-dataset_fig1_306056875

Looking at the images, each one is 32 X 32 pixel. Accordingly, each image can be considered as multi-dimensional variable y with d = 32 X 32 = 1024.

Each pixel , y[i], has a domain of values between 0 and 255 : 0 is black and 255 is white. So, there is a joint, multidimensional distribution for all y[i] that generates a specific image like number 3 , 9 etc. Even for the same number , there are different realizations , which means a two different “samples” from the same joint distributions. You can see that clearly in each column in Figure 1

Based on this information, the task for Generative Image Models, like Diffusion Models, is to generate random images that have close joint distributions to some reference , or training , images.

The same concept applies to Generative Text Models, like Large Language Models, with some modification of course. However, it is out of this article’s scope.

Figure 2: Simple illustration for the high level concept of diffusion models. Numeric image sources : https://machinelearningmastery.com/how-to-develop-a-convolutional-neural-network-from-scratch-for-mnist-handwritten-digit-classification/
Figure 3 : Simple illustration of the Diffusion process in physics. Source & Author : By JrPol — Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=4586487

As mentioned above, the task for Generative Image Models is to generate “random” images samples that is similar to a set of reference , or train images. The approach followed by Diffusion Models, is to find a forward and Reverse mapping between a sample from Gaussian noise distribution x[1:d] ~ N(0,I), to a corresponding sample , y[1:d] ~ p_Y. The mapping from x to y is called the Generative Process and form y to x is called the Reverse Process. See Fig. 2 for illustration

This is similar to what is called the Diffusion concept in physics which models the net movement of particles from the region of high intensity to regions of low intensity. In this case , the high intensity regions is similar to the target distribution (For example, for any MNIST image, it has high intensity bright-pixel regions and all other regions are with darker-piexls. On the other hand, the low intensity regions resembles the white noise images, where dark and bright pixels are uniformly distributed.

Application of Diffusion Models

In the case of computer vision, diffusion models can be applied to a variety of tasks, including :

  • Image generation : Diffusion Models uses either random noise or specific seed noise to generate viable images. More details can be found in this paper.
  • Image Noise Reduction: Several technique exists to reduce noise by relying on Hand-crafted priors that control the structured noise.
  • Super-resolution: Diffusion Models captures the essence of the images during the diffusion process. Then, it generate an image with higher resolution based on the encoded noise model. An survey for existing techniques can be found here . One of the most famous techniques is SR3, which proved to be extremely successful.

If you need support regarding Generative Models (like Diffusion Models and Large Language Models) and how they can support your business or for customized LLM and Generative Models solution , send us an email

Also Follow us on twitter for more similar Generative AI contents

References

  1. How DALL-E 2 Actually Works
  2. What are Diffusion Models?
  3. Stable Diffusion Explained

--

--

M. Baddar
BetaFlow

AI/ML Engineer, with focus on Generative Modeling. The Mission is enabling individuals and SMEs applying this technology to solve real-life problems.