How To Fine Tune Stable Diffusion on EC2
Stable Diffusion is an open text-to-image model from Stability.ai that has been making waves since it was released only three months ago. There has been a Cambrian explosion of art, features, and applications built on top of it.
Textual inversion teaches Stable Diffusion about specific concepts, like personal objects or artistic styles, by describing them using new “words”. These can be used in prompts, just like any other word.
To do this, you need a handful of images of the new concept and train an embedding. In practice, the training phase results in an “embedding.pt” file, which you then pass to the text-to-image model. Now you can refer to the new concept in prompts using a specific word. For a more in-depth overview, we recommend this Reddit post.
Generating images with the embedding doesn’t need a more powerful GPU than vanilla Stable Diffusion. You can use popular Stable Diffusion UIs with the embeddings too.
However, creating the embedding needs 20 GiB of GPU memory, so it’s unlikely many people will be able to run it on their laptops. This article explains how you can leverage powerful but relatively cheap EC2 spot instances to train textual inversion embeddings.
We’ll use Meadowrun to rent a GPU machine from AWS EC2 for a couple of hours. Meadowrun is an open-source library that makes it easy to run python code on the cloud. It will take care of launching an EC2 instance, getting our code and libraries onto it, and turning it off when done.
AWS and Meadowrun Prerequisites
To get started, you’ll need an AWS account, a local Python environment with Meadowrun, and then install Meadowrun in your AWS account. Here’s an example using pip in Linux, assuming your AWS account is set up:
$ python3 -m venv meadowrun-venv
$ source meadowrun-venv/bin/activate
$ pip install meadowrun
$ meadowrun-manage-ec2 install --allow-authorize-ip
There’s a detailed guide in Meadowrun’s documentation.
If you’ve never used GPU instances in AWS before, you’ll probably need to increase your quotas. AWS accounts have quotas in each region that limit how many CPUs of a particular instance type you can run at once. There are 4 quotas for GPU instances:
- L-3819A6DF: “All G and VT Spot Instance Requests”
- L-7212CCBC: “All P Spot Instance Requests”
- L-DB2E81BA: “Running On-Demand G and VT instances”
- L-417A185B: “Running On-Demand P instances”
These are all set to 0 for a new EC2 account, so if you try to run the code below, you’ll get this message from Meadowrun:
Unable to launch new g4dn.xlarge spot instances due to the L-3819A6DF quota which is set to 0. This means you cannot have more than 0 CPUs across all of your spot instances from the g, vt instance families. This quota is currently met. Run `aws service-quotas request-service-quota-increase --service-code ec2 --quota-code L-3819A6DF --desired-value X` to set the quota to X, where X is larger than the current quota. (Note that terminated instances sometimes count against this limit: https://stackoverflow.com/a/54538652/908704 Also, quota increases are not granted immediately.)
We recommend running the command in that message or clicking on one of the links in the list above to request a quota increase if you’re giving this a go (if you use a link, make sure you are in the same region as your AWS CLI as given by aws configure get region
). It seems like AWS has a human in the loop for granting quota increases, and in our experience it can take up to a day or two to get a quota increase granted.
Stable Diffusion Prerequisites
To create the embedding, we need a Stable Diffusion model. Go to the Stable Diffusion page on Hugging Face, accept the terms, and download the checkpoint file containing the model weights. That link points to the v1.4 model. You can use any version 1.x (as far as I know) but best use the same model for training and generating — mixing models gives bad results.
Then, create an S3 bucket and upload the model file to it so that your EC2 instance can access this file. From the directory where the checkpoint file was downloaded, run:
aws s3 mb s3://meadowrun-sd
aws s3 cp sd-v1-4.ckpt s3://meadowrun-sd
Remember that S3 bucket names are globally unique, so you’ll need to use a unique bucket name that’s different from what we’re using here (meadowrun-sd
).
Finally, grant access to this bucket for the Meadowrun-launched EC2 instances:
meadowrun-manage-ec2 grant-permission-to-s3-bucket meadowrun-sd
Textual Inversion Prerequisites
You’ll need 3 to 5 images of the concept you want to teach Stable Diffusion about. I used these images of a paper tiger:
It’s important that the images are upright, and 512x512 pixels. This textual inversion tutorial article has a python snippet to resize and rotate images.
Once you have the images, put them all in a folder tiger
and upload them to the S3 bucket:
aws s3 sync tiger/ s3://meadowrun-sd/textual-inversion
This should have created files like s3://meadowrun-sd/textual-inversion/tiger/tiger1.png
(of course, you can choose different filenames)
Training an embedding
Now we’re ready to rock.
import asyncio
import meadowrun
MEADOWRUN_MACHINE_CACHE= "/var/meadowrun/machine_cache"
def main():
# TODO: replace with your bucket name
s3_bucket_name = "meadowrun-sd"
# TODO: if necessary, replace with name of ckpt file in S3
model_ckpt ="sd-v1-4.ckpt"
# TODO: replace with a word that describes your images
initialization_word = "tiger"
# TODO: replace with a name for your run (not used by training, just fyi)
run_name = "tiger"
s3_log_folder_name = f"{run_name}_logs"
machine_log_folder_name = f"{MEADOWRUN_MACHINE_CACHE}/textual-inversion/{run_name}/logs"
asyncio.run(
meadowrun.run_command(
'bash -c \''
f'aws s3 sync s3://{s3_bucket_name} {MEADOWRUN_MACHINE_CACHE} --exclude "*" '
f'--include {model_ckpt} --include "textual-inversion/{run_name}/*" '
f'&& python main.py '
'--base configs/stable-diffusion/v1-finetune.yaml '
'-t --no-test '
f'--actual_resume {MEADOWRUN_MACHINE_CACHE}/{model_ckpt} '
f'-n {run_name} '
'--gpus 0, '
f'--data_root {MEADOWRUN_MACHINE_CACHE}/textual-inversion/{run_name} '
f'--init_word {initialization_word} '
f'--logdir {machine_log_folder_name} '
f'&& aws s3 sync {machine_log_folder_name} s3://{s3_bucket_name}/{s3_log_folder_name}\'',
meadowrun.AllocCloudInstance("EC2"),
meadowrun.Resources(
logical_cpu=1,
memory_gb=8,
max_eviction_rate=80,
gpu_memory=20,
flags="nvidia"
),
meadowrun.Deployment.git_repo(
"https://github.com/rinongal/textual_inversion",
branch="main",
interpreter=meadowrun.CondaEnvironmentYmlFile(
"environment.yaml", additional_software="awscli"
),
environment_variables={
"TRANSFORMERS_CACHE": "/var/meadowrun/machine_cache/transformers",
}
)
)
)
if __name__ == "__main__":
main()
Briefly, the snippet instructs Meadowrun to run the main.py
script from the textual inversion repository on a suitable EC2 virtual machine.
Marked with TODO
are a number of parameters you’ll need to change.
s3_bucket_name
: the name of the bucket where you uploaded model and training images to earlier.model_ckpt
: the filename of the Stable Diffusion checkpoint you uploaded.initialization_word
: a single-word rough description of the object (e.g., ‘toy’, ‘painting’, ‘sculpture’). This is not the word that will later represent the concept when generating images. It is only used as a beginning point for the optimization.run_name
: an arbitrary name for the run — just to make sure you can distinguish between results if you run multiple iterations. Needs to be the name of the folder where you put the training images.
The first parameter to run_command tells Meadowrun what we want to run on the remote machine. In this case we’re using bash to chain three commands together:
- First, we’ll use
aws s3 sync
to download the weights and the training images from S3. Our command will run in a container, but the/var/meadowrun/machine_cache
folder that we download into can be used to cache data for multiple jobs that run on the same instance.aws s3 cp
doesn’t have a--no-overwrite
option, so we useaws s3 sync
to only download the file if we don’t already have it. This isn’t robust to multiple processes running concurrently on the same machine, but in this case we’re only running one command at a time. - Second, we’ll run the main.py script which starts the finetuning process. This is using the default
v1-finetune.yaml
file in the textual inversion repo. There’s many parameters to tune here. If you wish to do so, we recommend uploading ayaml
file to S3 and passing that as argument instead. - The last part of our command will then upload the “logs” of the
main.py
script into our same S3 bucket. The logs contain the state of the embedding every 500 steps, as well as sample images.
The next two parameters tell Meadowrun what kind of instance we need to run our code:
AllocCloudInstance("EC2")
tells Meadowrun to provision an EC2 instance.- Resources tells Meadowrun the requirements for the EC2 instance. In this case we’re requiring at least 1 CPU, 8 GB of main memory, and 20GB of GPU memory on an Nvidia GPU. We also set
max_eviction_rate
to 80 which means we’re okay with spot instances up to an 80% chance of interruption. If the VM is interrupted or evicted frequently, you could try to switch to an on-demand instance by setting this parameter to 0.
Finally, Deployment.git_repo specifies our python dependencies:
- The first two parameters tell Meadowrun to get the textual inversion code straight from the main branch of the original repo.
- The third parameter tells Meadowrun to create a conda environment based on the packages specified in the environment.yaml file in the repo.
- We also need to tell Meadowrun to install awscli, which is a non-conda dependency installed via
apt
. We’re using the AWS CLI to download and upload files to/from S3.
What to expect when you run the snippet
First, Meadowrun tries to find a suitable EC2 machine, and start it. If there are multiple options, it goes for the cheapest first. If you hit any capacity or quota problems, Meadowrun will try more expensive options until all options are exhausted. The output tells you all about about this process:
Launched a new instance for the job: ec2-3-15-146-110.us-east-2.compute.amazonaws.com: g4dn.xlarge (4.0 CPU, 16.0 GB, 1.0 GPU), spot ($0.1578/hr, 61.0% eviction rate), will run 1 workers
Next, Meadowrun builds a container based on the contents of the environment.yaml
file. This takes a while, but Meadowrun caches the image in ECR so this only happens once. It also cleans up the image if you don’t use it for a while.
Building python environment in container a07bf5...
After that, everything is ready to run the command line, and you’ll see the output from the main.py script.
This script takes a couple of hours. Afterward, the final embedding, a bunch of log files, and intermediate checkpoints are available in your S3 bucket. You can view them using an S3 UI like CyberDuck or sync the bucket to your local machine.
Meadowrun will automatically turn off the machine if you don’t use it for 5 minutes, but if you’re done with it, you can turn it off manually:
meadowrun-manage-ec2 clean
Generating images
The main file you’re after is a file in the S3 bucket called something like tiger_logs/tiger2022–11–26T09–37–02_tiger/checkpoints/embedding.pt
. How to use this file depends on which wrapper — if any — for Stable Diffusion you’re using. In any case, you’ll need to pass embedding.pt
to the model somehow. For example, here is some guidance for Stable Diffusion WebUI, and here for InokeAI. The textual inversion repo also has a command line to generate images. If you’d like to generate images on EC2 instances, we’ve got you covered too, with How to Run Stable Diffusion on EC2 and How to Run Stable Diffusion 2 on EC2.
The results are not always what I expected, but you can tell it’s acquired some aspect of the training images.
Closing remarks
Have fun playing around with textual inversion —hopefully you’ve enjoyed how Meadowrun puts powerful GPU machines on AWS at your fingertips.
To stay updated on Meadowrun, star us on Github or follow us on Twitter!