Guide for finetuning Stablediffusion with your images

Create personalized images of you, your favorite character, or your cat

Published in

Jarvislabs.ai

6 min readJan 9, 2023

Have you dreamt of being a spiderman or iron man, I did while growing up (I want to keep it a secret that I still do😁). In the last few months, algorithms like Dall-E and Stable Diffusions have been able to generate beautiful images, and we can guide them through simple text.

Though it generates random images, they struggle to generate images of certain persons, animals, or objects.

Let's take a look at some examples of outputs generated from SD, they get it slightly right but are not realistic for our favorite characters.

These are some of the popular Indian movie stars.

Let's use a technique called Dreambooth a technique proposed by researchers at google that let's fine-tune these large image generation models to generate images of a specific object/person.

Let's take the example of a famous Indian actor “Mahesh Babu” and walk through how to fine-tune the SD model and generate images like these.

Photos generated by SD and Automatic 1111 on Jarvislabs

Quick steps to finetune SD

Setup the env and code required
Get images of your favorite character/object
Train the model, and understand key hyperparameters that may help in getting better results.
Convert the weights to a format compatible with Automatic 1111
Start generating images
Tips for prompt ideas

Setup environment

For this post, I am using an Nvidia A6000 GPU instance from Jarvislabs.ai. It comes with Automatic 1111 prebuilt, making our lives easier.

Note: You should be able to do most of the steps on your workstation.

Open the terminal from Jupyter Lab and run the below commands


wget https://gist.githubusercontent.com/svishnu88/f17d3b4233db69dfc119f77942104a91/raw/fd16254e5e78fe69c22ca06314c2c980c9c5cb61/setup_db_env.sh
bash setup_db_env.sh

The script will take a few minutes to run. While it is getting executed, let's try to understand what the script does. It is completely optional to understand it 😊.

#Create a python environment
python3 -m venv env_dream

#Activate the environment
source env_dream/bin/activate

# Install libraries required to train StableDiffusion model
pip install ipykernel ipywidgets
python -m ipykernel install --user --name=dreambooth

#Clone the diffusers library and install the required libraries
git clone https://github.com/huggingface/diffusers.git
cd /home/diffusers/
pip install -e .
cd /home/diffusers/examples/dreambooth/
pip install -r requirements.txt
pip install datasets

We create a separate python env, in which we install all the required libraries. If running the script does not throw any error, then we are good to train the model.

Get Images of your favorite character

We need to be careful while we pick the images we are using to train the model. Some of the tips to pick your input images.

Pick a few close-up photos
Few photos with different sides of the face and body
Avoid objects or other people in the background.
Choose different locations.
Avoid photos with heavy makeup/ornaments.

I took 9 images for this post, all from the internet.

Create a folder called input_imgs under the home directory and upload all your input images.

Train Model

From a terminal, you can run the below commands to start the model training. It could take close to 15+ minutes when you run the first time.

It would be faster the second time as it will skip some of the steps like downloading model weights and creating class images.

source env_dream/bin/activate
wget https://gist.githubusercontent.com/svishnu88/9adbd8ccdfd679a991d7696ef753a175/raw/17990e718487910cd08df3ace48bb231b7d82587/train_dreambooth.sh
bash train_dreambooth.sh

Let's look at what is happening inside the script so that you can tweak it for your needs.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export CLASS_DIR="class_images"
export INSTANCE_DIR="input_imgs"
export OUTPUT_DIR="output"

accelerate launch diffusers/examples/dreambooth/train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks person" \
  --class_prompt="Photo of Tom cruise" \
  --resolution=512 \
  --train_batch_size=1 \
  --train_text_encoder \
  --gradient_accumulation_steps=1 \
  --learning_rate=1e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=1200 \
  --mixed_precision='bf16' \
  --sample_batch_size=3

Some of the key things you may need to tweak are

instance_prompt
class_prompt
max_train_steps
mixed_precision

Instance prompt: <sks> is the keyword we assign to our character. We will be using it to generate images once our model is trained. <person> is the class to which our character/object belongs too. If we choose photos of a dog, then we replace “person” with “dog”.

Class prompt: To avoid overfitting we need to pass images of the class in our case it is the person. You can upload your images, or let SD generate images. In this case, I am asking SD to generate images of Tom Cruise. If you want to train the model on a female character, then we can choose a popular actress instead of Tom cruise.

If you are using your images, then you can create a folder called class_images and upload your relevant images.

Max train steps: This number would play an important role in the quality of the images generated. You can tweak the number based on the number of input images you choose. For a higher number of images, you may need to increase it and vice versa.

Mixed precision: You can use ‘bf16’ if you are using Nvidia GPUs with Ampere generation, for example, A100, A6000, and A5000 belong to these. This helps in training the models faster. If you are using older generation GPUs, then you can replace it with ‘fp16’ or remove the flag.

Convert model weights to .ckpt and use it with Automatic 1111

Once the model training is completed let's convert the model weights to a suitable weight and use it with Automatic 1111 a popular UI for SD.

python diffusers/scripts/convert_diffusers_to_original_stable_diffusion.py --model_path output/ --checkpoint_path my_model.ckpt

Once we have our custom weights let's move the weights to a place where Automatic 1111 can pick them up.

Run the below commands from a new terminal and do not activate the environment we created earlier.

mv my_model.ckpt /home/stable-diffusion-webui/models/Stable-diffusion/
cd stable-diffusion-webui
python launch.py

The above command could take a few minutes, once done we can access the Automatic UI in the Jarvislabs platform like below.

Time to generate images using our custom model 😀.

Tips to get your prompts:

This is where I struggle. I am no artist and have very little idea about the domain of art. So I take help from another AI application that can help us in generating initial prompts.