Lora Training Practice in Kohya ss: How are Images Assigned to Buckets? (with Examples and Illustrations)

Sable Confusion
7 min readMay 21, 2024

--

  • **This article is talking about the bucket generation process without specifying the optionDon’t upscale bucket resolution”, and it is recommended if you want to preserve the quality of your images during the training and avoid a related “bug”. ← see https://github.com/kohya-ss/sd-scripts/issues/731
  • Furthermore, the training in this article uses 512x512 as the maximum resolution with the following settings:

— If you want to assign your images using the provided chart, please jump to the ‘Bucket Assignment’ part. —

A Common Mistake

To be honest, originally, I thought that when preparing and trimming my images, I only needed to adjust their sizes with both dimensions divisible by 64 pixels. Then, the machine would help me resize the images and match them to the buckets with the same aspect ratio, so that the initial aspect ratio of my images would be preserved entirely before submitting training. However, one obvious mistake is that the aspect ratio actually varies with increasing lengths, even though the width and height can both be divided by 64. See below table.

As you can see, by using 64 intervals, when a larger image (e.g., 1088x1280 = 17:20) is downscaled, it will not fit the aspect ratio of smaller images. That means your images will be cropped more than you expected.

You might think that you just need to prepare images with the same aspect ratio as the buckets. Then, they will certainly be assigned to the buckets with the same aspect ratio so that all content of your images will be retained. Sadly, it seems not to be the case.

The following statements are merely based on my observations. There may be more rules for sorting images into buckets. Therefore, they may not be true in all cases.

Rules used for bucket generation

On the Kohya_ss GUI platform, bucket generation follows some rules.

  • bucket width + bucket height = 1024 ≥ 512+512 (max. resolution in the setting)
  • buckets’ area ≤ 512 x 512
  • buckets are ordered according to their aspect ratios; the perimeter of a middle bucket (B) must be greater than or equal to that of two adjacent buckets (A and C):

bucket A (width + height) ≤ bucket B (width + height) ≥ bucket C (width + height)

  • images are sorted according only to their aspect ratios
  • images are resized with minimal (scaling down) change while maintaining the aspect ratio of the images before cropping

The following scheme shows the process of putting images into buckets.

Process of putting an image into buckets
Comparison of two types of horizontal rectangular buckets. Over-resizing does not happen in our examples under our settings.

In this way, the image resizing process can achieve some goals. The images are kept in as much area as possible after resizing, and they are cropped as little as possible after resizing.

Real Example

I attach here an example of my training data set. A total of 76 images were loaded.

The following table shows the number of images of different resolutions, their respective heights, widths, and aspect ratios, as well as the bucket no. to which the images were assigned.

Now, let’s analyze what caused above result.

If the bucket resolution steps = 64, some possible buckets are listed below.

For your information, under the same perimeters, closer to a square, larger areas. That means if your images closer to a square, they have higher quality (more total pixels for training).

You might notice that even though we have an image of 576x768 with an aspect ratio of 6:8 (1.333), it will not be assigned to a bucket of 384x512 with the same aspect ratio (because bucket width + bucket height < 1024). Our 576x768 image will finally be put in a bucket of 448x576 with an aspect ratio of 7:9 (1.286) and cropped. The reason behind this is simple: to keep the area of resized images as large as possible.

The highlighted buckets are the available ones that will be used during training due to their larger area. Therefore, I recommend using the corresponding aspect ratios (i.e., 1:4, 4:15, 5:12, 5:11, 6:10, 7:9, and 1:1) when adjusting our initial images. In this way, we can preserve the content of our images.

The bucket 256x768 with an aspect ratio of 1:3 (3.00) is not selected, even though it meets all requirments except this:

bucket A (width + height) ≤ bucket B (width + height) ≥ bucket C (width + height)

but, bucket 265x832 (1088) > bucket 256x768 (1024) < bucket 320x768 (1088)

Case study

Let’s consider a second example. For a 768x512 image with an aspect ratio of 9:6 (0.667), which buckets, 576x448 or 640x384, will be assigned? How do we calculate? We can compare the results in the respective buckets after resizing and cropping the image of 768x512.

Case 1 — bucket 576x448 (area = 258048)

image resized area:

A 768x512 image is resized to 672x448 (area = 301056) (rather than 576x384 because it is over-downscaled)

cropped area:

Area of the image is cropped = (301056–258048) = 43008

trade-off between image resized area and cropped area:

We further calculate (area of the bucket – image loss) = 258048–43008 = 215040

Case 2 — bucket 640x384 (area = 245760)

image resized area:

A 768x512 image is resized to 640x426.67 (area = 273067) (rather than 576x384 because it is over-downscaled)

cropped area:

Area of the image is cropped = (273067–245760) = 27307

In this case, this bucket has a smaller area, but the image has less image loss.

trade-off between image resized area and cropped area:

We further calculate (area of the bucket – image loss) = 245760–27307 = 218453

Overall, the 768x512 image will be assigned to the 640x384 bucket.

What if we extend our study to case 3 if a bucket of 576x384 with the exact same aspect ratio is used.

Case 3 — bucket 576x384 (area = 221184)

image resized area:

A 768x512 image is resized to 576x384 (area = 221184).

cropped area:

No cropped loss.

This case has the largest values of (area of the bucket – image loss) among the three cases. Nevertheless, this bucket is not available.

Therefore, the image resizing process is trying to maintain the aspect ratio before cropping, while keeping as much area of the image as possible after resizing and cropping the images as little as possible after resizing.

When the box “Don’t upscale bucket resolution” is checked

For your interest, if we specify the option “Don’t upscale bucket resolution”, some of the images were loaded on buckets with smaller areas.

Same training data set was used

The rule 1) now seems changed to: bucket width + bucket height ≤ 1024. The overall quality of your images will be lower as they are squeezed into buckets with smaller areas. As you can see, a 448x448 bucket was generated, which is the ‘bug’ I mentioned at the beginning.

Bucket Assignment

If you are adjusting an image and wondering which bucket it will go into, the following chart (please see second one) may give a help.

Theoretical Chart meets best conditions

Nevertheless, on Kohya_ss platform, the switching points are calculated by averaging two consecutive numbers (aspect ratios) of buckets ლ(´•д• ̀ლ

Used in Kohya_ss ↓↓↓↓↓↓↓↓↓↓↓

How to use it? Just look for 3 values.

Case 1

aspect ratio of a image: 3:4 = 1.333 , aspect ratio of bucket: 7:9 = 1.286, switching point: 1.476

If you have a 576x768 image with an aspect ratio of 3:4 = 1.333, it falls between the bucket 7:9 = 1.286 (blue color) and switching point 1.476 (orangle color), so it will be assigned to the closest bucket 7:9 (448x576). The image will fit the short side of the bucket, and its exceeding parts will be cropped.

Case 2

aspect ratio of a image: 13:20 = 1.538 , aspect ratio of bucket: 3:5 = 1.667, switching point: 1.476

If you have a 832x1280 image with an aspect ratio of 13:20 = 1.538, it falls between the switching point 1.476 (orangle color) and the bucket 3:5 =1.667 (blue color), so it will be assigned to the closest bucket 3:5 (384x640). The image will fit the long side of the bucket, and its exceeding parts will be cropped.

Take home message

In short, if we prepare a training data set involving images with different shapes, emphasize image quality, and try to prevent much cropped loss, here are some recommendations.

  1. Uncheck the option — “Don’t upscale bucket resolution”
  2. Try to use these aspect ratios, 1:4, 4:15, 5:12, 5:11, 6:10, 7:9, and 1:1
  3. If you prefer other aspect ratios than those mentioned above, you can know how your images being cropped during training by using the provided chart.

As image’s quality is crucial to lora training, we are heading to successful cooking if we have a good start. I might write another article to discuss the differences in the results of lora training between checking and unchecking the option “Don’t upscale bucket resolution”. The short conclusion is that, without specifying the option “Don’t upscale bucket”, the result is better.

--

--

Sable Confusion

My art creations using stable diffusion are displayed on my instagram. https://www.instagram.com/sable_confusion/ Feel free to follow