Unleash Your Inner Artist : How to Fine-tune Stable Diffusion using LoRA

A Step-by-Step Guide Using Google colab

Parth Panchal
6 min readMay 14, 2024

Imagine having a master artist’s toolkit at your fingertips a powerful brush, vibrant paints, and years of honed skill. That’s essentially what a pre-trained machine learning model is like. But what if you want the artist to specialise in a particular style? That’s where fine-tuning comes in, and in this blog, we’ll be doing just that for Stable Diffusion!

Here’s the key training a brand new model from scratch is incredibly time-consuming and resource-intensive. Fine-tuning is like giving our existing model a few key reference photos. This allows them to adapt their skills to a new style much faster.

Today, we’ll explore how to fine-tune a model to generate a LoRA model that can be used within Stable Diffusion.

Photo by Jennie Razumnaya on Unsplash

LoRA Models

LoRA stands for Low-Rank Adaptation of Large Language Models. It’s a fantastic technique that lets us fine-tune these massive models without needing a complete overhaul. Here’s why it’s so powerful:

Faster and Lighter: Full fine-tuning takes a significant amount of time and resources. LoRA is like working with a smaller, more focused team. It trains much faster and results in a smaller file size (think a few hundred megabytes). This makes LoRA models easy to store, share, and use even on your home computer.

Targeted Adjustments: Instead of retraining everything, LoRA focuses on specific parts of the model responsible for combining the text prompt with the image. It’s like giving our artist specific instructions on how to use those reference photos.

Now you might be wondering aren’t there other fine-tuning techniques available? Indeed, methods like textual inversion and Dreambooth exist. However, Dreambooth can be computationally expensive, and textual inversion may not always deliver the best results. LoRA stands out because it’s both efficient and effective, allowing you to achieve stunning image quality in a compact package.

You can’t use a LoRA model on its own. It needs a companion, like Diffusion models. These base models are like the canvas, and your LoRA brush adds the artistic flair.

In this tutorial, we go through LoRA for Stable Diffusion using Google Colab.

Let’s dive in!

This is the latest link to the notebook .

Let’s start by opening the Colab notebook. It is pretty straightforward as it contains all the relevant information and suggestions on how to fine-tune the model. Ideally, we should only upload our images, update a few parameters, and run the code.

Let’s go step-by-step through the code and see how to do that.

1. Installing Dependencies

  • Look for the code section labelled install dependencies. This part install the necessary libraries Kohya needs to run smoothly.
  • Important! Make sure the checkbox for mount drive is selected, this connects your Google Drive to the notebook.
Dependencies section

2. Downloading the Base Model

  • This is where we choose the pre-trained model that we want to use as a base for our LoRA.
  • Select the model we want to fine-tune. In this case, we’ll choose Stable Diffusion V1.5 as mentioned. It also supports Stable Diffusion 2.1, so is has options!
  • If you have a different model in mind, you can even fine-tune your own model or one you uploaded to Hugging Face. The code section will have instructions for providing the path to the Hugging Face model and your Hugging Face token.
Choose the model

We’ve got tools ready, and now it’s time to prepare the paint for your artistic masterpiece the image data!

3. Uploading Image Dataset

Now comes the fun part,uploading image collection! In the code section, you’ll likely see two variables:

zip_file_URL: This is where we’ll provide the path to the zipped folder containing the training images.

Preparing Image dataset

Here are some additional tips for your image collection

Quality over Quantity: It’s better to have a few high-quality images that truly represent your desired style than a bunch of blurry or irrelevant ones.

Image Count: The number of images you need depends on what you’re training for. Aim for 5–20 images for objects/subjects and around 100 for styles.

Consistent Format: Make sure all your images have the same file extension (like PNG or JPG). The size and resolution can vary.

4. Image Captioning

If creating captions manually sounds daunting, fear not! it offers some helpful tools in the Data preprocessing section. Look for the cell labelled BLIP captioning. Run this cell as is. This will automatically generate captions for all the images in your training folder, creating a text file next to each image with the same name.

Image captioning using BLIP

We’ve prepped our artistic tools and gathered our creative fuel (image data). Now it’s time to unleash your inner artist and train your very own LoRA model!

5 Configuring the Training Model

This is where you configure your training parameters.

  • Project Name: Choose a name to identify your creation.
  • Pre-trained Model Path: Specify the downloaded Stable Diffusion model (e.g., /content/pretrained_models).
  • Output to Drive (Optional): Save progress and images to your Google Drive for easy access.
Preparing config file for training

The provided notebook offers various training parameters, but for a first-time LoRA creation, it’s recommended to stick with the default settings and run all the remaining cells as is.

However, if you’re curious about tweaking things later, here are some pointers

Learning Rate: Lower learning rates tend to work better for objects and faces. The paper suggests 2e-6 for objects and 1e-6 or 2e-6 for faces.

Training Steps: The number of training steps can impact results. The paper recommends around 400 steps for objects and 1200–1500 steps for faces.

Open the sample_prompt.txt file and edit the prompt as desired. This prompt will be used to generate images during training, giving you a glimpse of your LoRA’s developing style.

Change the prompt

6. Training Time!

Now comes the exciting part — bringing your LoRA to life!

Click the Run button next to the Train Model cell. It will start training your LoRA model based on your image data and prompt.

Train Model

7. Upload your model to Hugging Face

you can upload it to Hugging Face, The notebook provides instructions for uploading your model, including obtaining a Hugging Face token

Upload trained model to hugging face

This will allow you to run your model without having to download it.

Congratulations! You’ve successfully created your own LoRA model .Now you can use your LoRA to generate stunning images infused with Stable diffusion.

I hope this was useful!!! Thank you for reading :)

--

--

Parth Panchal

Machine Learning Engineer | AI/ML Practitioner | Deep Learning Enthusiast