Using AI to generate Images: A blog on Stable Diffusion

Aditya Deodeshmukh
CodeChef-VIT
Published in
5 min readMar 24, 2023

Is AI going to take over the world and become our supreme overlord? Well, that reality is a long way away, however, what it can do is draw a horse with a frog head captaining the Titanic in the Sahara Desert. So, let me show you how you can turn the absurd ideas in your head into images using Stable Diffusion.

Introduction

Ever Since I was a child, I have had a fascination for images. Clicking and editing photos has been and remains one of my main hobbies and this fascination for images transferred into my life as a Computer Science Student. I started out with basic Image Processing before eventually moving on to Deep Learning using CNNs. Although I found tasks such as classification and regression using images interesting, the coolness factor was still missing for me.

After extensive research of 30 minutes, I came across the world of Image Generation. I started out by using GANs but soon figured out they were extremely finicky and even slight changes in the architecture caused them to completely collapse. I gave up on using GANs and went about with other projects. One day, however, I came across a video on YouTube by Computerphile introducing Stable Diffusion. The video provided a concise explanation of Stable Diffusion along with the code. Finally! I could bring my brainchild into this world! All thanks to Stable Diffusion.

Astronaut on a horse and A Cheese Mouse

Well, what is Stable Diffusion?

Stable Diffusion is classified as a Generative diffusion Model. As the name suggests, generative models generate new content when given certain initial conditions and a certain procedure. Stable Diffusion uses the concept of backward diffusion to remove noise from an image to generate new images.

How does stable diffusion work?

Stable diffusion is at its core a denoising procedure. All it does is take an image and attempt to remove all the noise in the image.

To turn this into a step-by-step approach you do the following:

  1. Take an image that is filled with noise.
  2. Input this image into your model. It will give you the best prediction of what noise is present in the image. Now, you remove the noise from the image. For the first turn, it is going to give you a very noisy image.
  3. You take this new image and feed it back into the model
  4. Now, you repeat the process till you get a proper image
Figure 1: Working of Stable Diffusion

What can stable diffusion be used for?

Text to Image: Stable diffusion allows you to generate images using a given prompt. For example, if you want to generate a “Superhero Cat”, text to Image allows you to do that.

SuperCat

Image to Image: Stable Diffusion also allows you to generate images that are similar to pre-existing images or share structures present in the pre-existing images. So you can make certain changes to images like changing the season of the image.

Castle in Winter

Transition: With Stable Diffusion, you can create specialized pipelines that can allow you to transition from one generated image to another. This can be in the form of a video or a series of images.

Transition From one Castle To Another

Why Stable Diffusion?

Although competitors such as OpenAI’s DALL-E 2 and Google’s Imagen are popular competitors, there are 2 factors in which Stable Diffusion stands out:

  1. Open Source: Stable Diffusion is an Open-Source software, meaning you can download and mess around with the code. It allows you to build customized apps and pipelines over the core Stable Diffusion model. DALL-E 2 is limited to an API maintained by OpenAI.
  2. Hugging Face APIs: The Hugging Face python APIs are an extremely convenient way to generate images using Stable Diffusion. Anyone capable of installing the prerequisites for the python package can start generating images in a matter of minutes. The APIs also allow for techniques to generate images using lower-end hardware.

How do you generate your own images?

Well now that we have the basics out of the way, let’s see how to generate our candy dragon! For this, we will use the Diffusers package from Hugging Face.

Prerequisites: Before writing code for the generating images, you will need to install some python packages and drivers:

  1. PyTorch
  2. CUDA/CuDnn
  3. diffusers
  4. PIL

All of these can be easily installed using the pip command

(Disclaimer: You will require an NVIDIA GPU for this method)

Text to Image:

First, we import all the libraries we need

import torch
from diffusers import StableDiffusionPipeline

Now, we load the Stable Diffusion pipeline into the memory and then load it into our GPU using CUDA

model = "runwayml/stable-diffusion-v1-5"
type = torch_dtype=torch.float16
pipe = StableDiffusionPipeline.from_pretrained(model,type)
pipe = pipe.to("cuda")

We then define our parameters and generate images using the pipeline we have created

prompt = "a candy dragon"
image = pipe(prompt).images[0]
image.save("img1.png")

The Consolidated Code:

import torch
from diffusers import StableDiffusionPipeline
model = "runwayml/stable-diffusion-v1-5"
type = torch.float16
pipe = StableDiffusionPipeline.from_pretrained(model,torch_dtype=type)
pipe = pipe.to("cuda")
prompt = "a candy dragon"
image = pipe(prompt).images[0]
image.save("img1.png")

The Result:

A Candy Dragon

Image to Image:

The image-to-image pipeline procedure is very similar to the text-to-image pipeline, with some minor additions and changes.

Import an image from your Computer

from PIL import Image
img=Image.open("test.jpg")

Instead of the Stable Diffusion Pipeline, you will use the Stable Diffusion Img2Img Pipeline and pass your image as a parameter while generating the image.

Consolidated Code:

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline
img=Image.open("test.jpg")
model = "runwayml/stable-diffusion-v1-5"
type = torch.float16
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model,torch_dtype=type)
pipe = pipe.to("cuda")
prompt = "castle with cherry blossoms"
image = pipe(prompt,image=img).images[0]
image.save("img1.png")

The Result:

Castle With Cherry Blossoms

Conclusion

The world of Machine Learning has advanced significantly since its origin in 1943 when all one could do was distinguish between circles and rectangles. Diffusion models such as Stable Diffusion are huge leaps toward the development of AI. AIs today can write Rap Verses, Evaluate Resumes, Mimic Human Voices, Generate Beautiful images, and much more. Who knows what else is yet to come? Frankly, that is what excites me the most.

GitHub Repository: https://github.com/AdityaDeodeshmukh/StableDiffusionImageGenerator

Inspirations

--

--

Aditya Deodeshmukh
CodeChef-VIT

An Enthusiastic Machine Learning Student with a knack for knowledge