The LR Finder Method for Stable Diffusion

damian0815
6 min readApr 12, 2023

--

Damian Stewart / @damian0815

This article is part of a series on fine-tuning Stable Diffusion models. See also Fine Tuning Stable Diffusion With Validation.

In this article I present a quick introduction to how to apply Leslie Smith’s method for finding the appropriate learning rate for a Stable Diffusion fine-tuning (“Dreambooth”) dataset.

The idea behind the method is pretty simple. You run training with validation enabled for a small number of steps on your dataset, using a learning rate (LR) that smoothly increases from 0 to a value that is likely too high. This produces a distinctive shaped curve in the validation graph. You can interpret this curve to find the steps where the model was training best. From this you can determine the optimal learning rates you should use to train Stable Diffusion on your dataset.

For another take on this method, please also check out FollowFox’s article here: https://followfoxai.substack.com/p/find-optimal-learning-rates-for-stable .

Running the LR Finder Method

1. Use EveryDream2.

If you’re not already, use EveryDream2. It has proper, noise-stabilised validation graphs. Without these, you can’t use Leslie Smith’s method.

2. Setup validation.

Following the instructions here, enable validation using the --validation_config argument to EveryDream2.

3. Determine your epoch length.

Start a training run with your dataset (you can just use the default settings — because going to be interrupting it as soon as we know the epoch length). Check that validation is running (you will see a validation progress bar flash up at the start if it’s working):

screenshot of a terminal window with “validate (val)” circled in red, indicating validation is active

Wait until you can see how many steps there are in an epoch:

screenshot of a terminal window with “6/57” circled in red, indicating that the epoch length is 57 steps

Then interrupt and stop training. Write down the number of steps in your epoch — this is your epoch length (57 steps in this example), and you’ll need it for the next step.

4. Set the LR scheduler to Linear and train for 10 epochs with a 10 epoch LR warmup.

The LR value needs to trace a smooth line from 0 up to the highest LR you want to test, over 10 epochs. Let’s take 3e-6 as the max — this is a value that is likely too large for regular training, but depending on your dataset it may need to be increased or decreased. If we use a linear scheduler with a warmup of 10 epochs (i.e. 10 x epoch length steps), the LR will start at 0 and move smoothly up to the max over 10 epochs. 10 x epoch length from the example above is 570 steps, but note that you will need to change this value to match your own dataset.

To apply these settings, make a copy ofoptimizer.json and name it optimizer-findlr.json . In optimizer-findlr.json locate the base configuration section and make the following changes:

  • set lr to the max LR we want to check (3e-6 in this example)
  • set the lr_scheduler to linear
  • set lr_warmup_steps to the number of steps required for 10 epochs (570 in this example)
"base": {
"optimizer": "adamw8bit",
"lr": 3e-6,
"lr_scheduler": "linear",
"lr_decay_steps": null,
"lr_warmup_steps": 570,
"betas": [0.9, 0.999],
"epsilon": 1e-8,
"weight_decay": 0.010
},

Now you can launch the EveryDream2 using your modified optimizer-findlr.json , like this:

python train.py --config train.json \
--validation_config validation_default.json \
--optimizer_config optimization-findlr.json \
--max_epochs 10

Note that the above example assumes that your train.json is already correctly setup, and that you are on a Linux system. On Windows you’ll need to replace the \ characters at the end of each line with ^ characters.

5. Find the loss/val and hyperparameter/lr unet graphs.

Check the graphs in Tensorboard or wandb.ai — you want to find the loss/val graph and the hyperparameter/lr unet graph. The hyperparameter/lr unet graph should trace a straight line upwards like this (don’t worry if the numbers on yours aren’t the same):

The loss/val graph should trace a curve that dips down, flattens out, and then rises and continues rising, something like this (again don’t worry if the numbers aren’t the same or if it doesn’t start from 0):

Your graph may not show the full curve. That’s ok, what’s important is that you can see the graph clearly fall, flatten out, and then start to rise — on the graph above, the area between step 0 and step 250.

  • If you don’t see the graph starting to rise, double the max LR you used in step 4 and run it again (i.e. if you used 3e-6 then try 6e-6 ). If this still doesn’t help, you could try to double it again, but also check your dataset for duplicates or near-duplicates — these will need to be removed for loss/val to be properly calculated.
  • If the graph starts to rise too quickly to see the shape of the curve, halve the max LR you used in step 4 and run it again (i.e. if you used 3e-6 then try 1.5e-6).

6. Compare the loss/val and hyperparameter/lr unet graphs to find the ideal learning rates for your dataset.

When you put the loss and lr graphs next to each other, it gives you a reliable guide to the range of learning rates you might want to try for training your dataset. Look on the loss graph for the steepest downward slope. This is where the model was training most efficiently. Use the lr graph to determine what the learning rates were during this steepest downard slope — the is then the ideal range of learning rates for training your dataset.

As you can see in the graph above, when the learning rate is below 2e-7 the loss graph is only falling very slowly. When the learning rate reaches2e-7 the loss starts to fall faster, until it 7e-7 at which point the loss starts to level out and oscillate before beginning to rise. This means that the ideal learning rates for this dataset are between 2e-7 and 7e-7.

7. Train using a learning rate in the ideal range.

If you want to see results sooner, you could try already training at 7e-7. If you’re willing to wait for more epochs, you could try 5e-7 or even 3e-7 — in my tests, training at a lower learning rate for longer always resulted in a better, more flexible model.

In general, rather than trying to find the absolute best learning rate, you should pick a number of epochs that you’re willing to wait for and optimise your learning rate so that you reach the valley of the validation curve towards the end of your training process. See my other article on Fine Tuning Stable Diffusion with Validation for more information on that.

That’s it.

It’s pretty straightforward, really! Train for 10 epochs, with a learning rate that goes like this: ⟋. Check the validation graph for a curve that goes steeply down, flattens, then rises. Good learning rates are when the graph is going down.

If you’d like another take on this method, be sure to check out FollowFox’s article here: https://followfoxai.substack.com/p/find-optimal-learning-rates-for-stable.

--

--