A basic introduction to Image Generation methods

4 min readAug 6, 2023

Image generation using AI involves using algorithms and deep learning models to create realistic and novel images from scratch or based on given input data.

Some of the Image Generation models are :

Variation Autoencoders (VAEs)
Generative Adversarial Models (GANs)
Auto Regression Models
Diffusion Models — Most trending at the moment

Let’s dive into these models :

Variational Autoencoders (VAEs)

VAEs are a type of Generative Models in Deep Learning, that can learn to encode and decode data, enabling the generation of new and realistic data.

The data flow :

Encoding — The input data is passed through an encoder neural network, that learns to map the data to the lower dimension known as latent space.

Here the latent space captures the essential features from the input data.

Sampling — In latent space, the random vectors are sampled to generate new data points.
Decoder — The sampled vectors from latent space are passed onto the decoder neural network that reconstructs the original data.

Here the aim of the decoder is to minimise the reconstruction error.

Generative Adversarial Network (GANs)

The idea of Generative Adversarial Network (GANs) is to make two Neural Networks compete with each other.

One will generate images similar to the training data.
The other will classify which is generated and training images.

Hence, creating a realistic image.

The workflow :

Here :

The discriminator is used to classify the generated and real image.
The discriminator loss is the pass of data when it couldn’t classify which is real and which is generated.
The generative loss is when the discriminator was able to classify which is real and which is generated.

Autoregressive Models :

The autoregressive model generates images from the random noises or latent vectors in the Variational Autoencoders (VANs).

Autoregressive Model for RNN Language Modeling

These generate images by treating an image as a sequence of pixels.

Diffusion Models :

Diffusion Model is a Probabilistic Generative Model, that makes use of noise injections and learnable transformations to generate realistic images from random noise vectors.

We add noise and then denoise using Denoising Diffusion Probabilistic Models (DDPM)

Working :

Adding a noise to the image is done by iterating the process of addition of noises :

X0 is the real image, XT is the fully noised image

The same process of iteration goes for denoising an image :

XT is the fully noised image, X0 is the noise removed image

Diffusion Model Use Cases :

The Diffusion Model is based on two cases :

Unconditioned Generation — Generation of an image without any exterior input or data, rather it generates using the model itself. For example : Human Face Synthesis, Super-resolution

Conditioned Generation — Generation of an image using an exterior input or data. For example : Text-to-Image, Image-Inpainting, Text-Guided Image-to-Image

The Diffusion Models are inspired by physics, specifically thermodynamics

An aerial image of UFO in a field surrounded by people

Q & A :

1) What are some challenges of Diffusion Models ?

Ans)

They can generate images that are not realistic.
They can be computationally expensive to train.
They can be difficult to control.

2) Which process involves a model learning to remove noise from images ?

Ans) Reverse Diffusion

3) What is the goal of Diffusion Models ?

Ans) To learn the latent structure of a dataset by modeling the way in which data points diffuse through the latent space.

4) What is the name of the model family that draws inspiration from physics and thermodynamics ?

Ans) Diffusion Models

5) What is the process of forward diffusion ?

Ans) Start with a clean image and add noise iteratively.

Tadaa!! That’s it guys, this is an introduction to some image Generation methods, which you have now got an understanding. All the best for future learning, thank you ^^