What are Diffusion Models? The Technology Behind AI-Generated Images
As AI-generated images are creating a lot of buzz across all social media platforms and also at prestigious award shows, AI art has been trendy in recent times, As of 2020, a bunch of research papers has been published, that have attracted machine learning practitioners' interest in the technology and told the world how the technology will dominate image generation markets in a few years. As tech giants also took interest in developing these types of models, diffusion models have swept the globe since the release of Open AI’s Dall-E 2 and Google’s Imagen, Stable Diffusion, and Midjourney, spurring innovation and challenging the limits of machine learning.
Isn’t it strange and wonderful how these photo-generated models can create an almost unlimited variety of images from text prompts like “The Demogorgon from Stranger Things holding a basketball”? Some of the categories can be generated just from the text, including the photorealistic, the fantastical, the futuristic, and, of course, the adorable.
In this article, I will discuss the recent advancements and the secret behind image generation, called Diffusion Models in a simpler way.
What are Diffusion Models?
Firstly, I am explaining how diffusion works. Diffusion is a process in thermodynamics where particles move from high-density to low-density areas or from high-energy to low-energy areas. As you can see in the following gif.
Diffusion Models:
Machine learning models that can create new data from training data are referred to as generative models. Other generative models include flow-based models, variational autoencoders, and generative adversarial networks (GANs). Each can generate images of excellent quality, but they all have drawbacks that make them less effective than diffusion models.
Since diffusion models are generative models, they can be used to create data that is comparable to the data they were trained on. Diffusion models basically work by erasing training data by adding Gaussian noise one at a time, then learning to recover the data by undoing this noise-adding process. After training, we may produce data using the Diffusion Model by simply applying the mastered denoising technique to randomly sampled noise.
A Diffusion Model is a latent variable model that uses a fixed Markov chain to map to the latent space, to be more precise. To obtain the approximative posterior, this chain gradually introduces noise into the data. The latent variables with the same dimensionality as x0 are q x0,…, xN = q x0, q x1, x0… q(xN|xN1). The manifestation of such a Markov chain for picture data is shown in the figure below.
The final transformation of the image is to pure Gaussian noise via asymptotics. A diffusion model’s training objective is to learn the reverse process, or training p0(𝑥t−1 |𝑥t). This chain can be traversed backwards to produce fresh data.
Usage of Diffusion Models:
Inpainting, outpainting, bit diffusion, image creation, and image denoising are just a few of the jobs that diffusion models can do.
Popular diffusion models are Dall-E 2 from Open AI, Imagen from Google, and Stable Diffusion from Stability AI.
- Dall-E 2: When Dall-E 2 was unveiled in April 2022, it produced photos that were much more realistic and had greater resolutions. On the OpenAI website, Dall-E 2 is accessible to the general public as of September 28, 2022, with a few free photographs and other images available for purchase.
- Imagen: A text-to-image diffusion model developed by Google in May 2022 is called Imagen and is not accessible to the general public.
- Stable Diffusion: Stability AI unveiled Stable Diffusion in August 2022, an open-source Diffusion model that is comparable to Dall-E 2 and Imagen. The model weights and open source code have been made available by Stability AI, making the models available to the whole AI community. Stable Diffusion was trained on an open dataset utilising the 2 billion English label subset of the CLIP-filtered image-text pairings open dataset LAION 5b, which was compiled by the German nonprofit LAION from a generic internet sweep.
- Midjourney: Midjourney is a different well-liked diffusion model that was introduced in July 2022 and is accessible via an API and a discord bot.
Diffusion models, to put it simply, are generative tools that let users build nearly any kind of image they can think of. The popularity of diffusion models has skyrocketed lately. Diffusion Models presently produce State-of-the-Art image quality, with samples shown below. They are inspired by non-equilibrium thermodynamics.
Diffusion Models offer a variety of other advantages in addition to having cutting-edge image quality, such as not requiring adversarial training. The drawbacks of adversarial training are widely known, hence it is generally preferable to choose non-adversarial alternatives with equivalent performance and training effectiveness. Diffusion models also offer the advantages of scalability and parallelizability in terms of training effectiveness.
GAN’s VS Diffusion Models:
In recent years, the machine learning community has seen a rise in the adoption of diffusion models, and it is probable that this trend will continue.
Two well-liked and effective methods in the field of generative modelling are GANs (Generative Adversarial Networks) and Diffusion Models. While there are some parallels between them, there are also some key variances.
Diffusion models’ promising performance in a range of applications, such as generative modelling, natural language processing, and image and speech processing, is one explanation for this. Additionally, compared to other generative models, these models have been demonstrated to be more resistant to adversarial attacks.
Another difference is that GANs can generate high-quality images with sharp edges and fine details, while Diffusion Models tend to produce images with a softer, more blurred appearance.
Both GANs and Diffusion Models have advantages and disadvantages, and which one should be used depends on the application at hand and the objectives of the model.
Thanks for Reading>>!
Follow for more content.
Find me on LinkedIn: Madan Lal
Reference:
1- Ryan O’Connor, Introduction to Diffusion Models for Machine Learning
2- Vivek Muppalla and Sean Hendryx, Diffusion Models: A Practical Guide