Beginner’s guide to Stable Diffusion models and the ones you should know

Andrew
6 min readNov 29, 2022

--

Beginner's guide to Stable Diffusion models and the ones you should know

Models, sometimes called checkpoint files, are pre-trained Stable Diffusion weights intended for generating general or a particular genre of images.

This is part 3 of the beginner’s guide series.

Read part 1: Absolute beginner’s guide.

Read part 2: Inpainting.

(This article was first published in https://stable-diffusion-art.com/)

What images a model can generate depends on the data used to trained them. A model won’t be able to generate a cat’s image if there’s never a cat in the training data. Likewise, if you only train a model with cat images, it will only generate cats.

We will go introduce what models are, some common ones (v1.4, v1.5, F222, Anything V3, Open Journey v4), how to install, use and merge them.

Fine-tuned models

What is fine tuning?

Fine tuning in a common technique in machine learning of taking a model that is trained on a wide dataset, and train a bit more on the narrow dataset that you are interested in.

A fine-tuned model will be biased towards generating images similar to your dataset, while maintaining the versatility of the original model.

Why do people make them?

Stable diffusion is great but is not good at everything. For example, it can and will generate anime style images with the keyword “anime” in prompt. But it could be difficult to generate images of a sub-genre of anime. Instead of tinkering with the prompt, you can fine tune the model with images of that sub-genre.

How are they made?

There are two main fine tuning methods: (1) Additional training and (2) Dreambooth. They both start with a base model like Stable Diffusion v1.4 or v1.5.

Additional training is achieved by training a base model with an additional dataset that you are interested in. For example, you can train Stable Diffusion v1.5 with an additional dataset of vintage cars to bias the aesthetic of cars towards the sub-genre.

Dreambooth, originally developed by Google, is a technique to inject custom subjects into text-to-image models. It works with as few as 3–5 custom images. You can take a few pictures of yourself and use Dreambooth to put yourself into the model. A model trained with Dreambooth requires a special keyword to condition the model.

There’s another less popular fine tuning technique called textual inversion. The goal is similar to Dreambooth: Inject a custom subject into the model with only a few examples. A new keyword is created specifically for the new object. Only the text embedding network is fine tuned, while keeping the rest of the model unchanged. In layman’s term, it’s like using existing words to describe a new concept.

Models

There are hundreds of fine-tuned Stable Diffusion models and the number is increasing everyday. Below is a list of model that can be used for general purpose.

Stable diffusion v1.4

Model Page

Download link

Released in August 2022 by Stability AI, v1.4 model is considered to be the first publicly available Stable Diffusion model.

You can treat v1.4 as a general-purpose model. Most of the time, it is enough to use it as is unless you are really picky about certain styles.

Stable diffusion v1.5

Model Page

Download link

v1.5 is released in Oct 2022 by Runway ML, a partner of Stability AI. The model is based on v1.2 with further trainings.

The model page does not mention what the improvement is. It produces slightly different results compared to v1.4 but it is unclear if they are better.

Like v1.4, you can treat v1.5 as a general-purpose model.

In my experience, v1.5 is a fine choice as the initial model and can be used interchangeably with v1.4.

F222

Model Page

Download link

F222 is trained originally for generating nudes but people found it useful in generating beautiful female portraits with correct body part relations. Interestingly, contrary to what you might think, it’s quite good at generating aesthetically pleasing clothings.

F222 is a special purpose model but it is quite useful for portraits. It has a high tendency of generating nudes but it can be suppressed by keywords like “dress”.

Read recommended keyword list.

Anything V3

Model Page

Download Link

Anything V3 is a special-purpose model trained to produce high-quality anime-style images. You can use danbooru tags (like 1girl, white hair) in text prompt.

It’s useful for casting celebrities to amine style, which can then be blended seamlessly with illustrative elements.

One drawback (as least to me) is that it produces females with disproportional body shapes. I like to tone it down with F222.

Open Journey

Model Page

Download link

Open Journey is a model fine-tuned with images generated by Mid Journey v4. It has a different aesthetic and is a good general purpose model.

Model comparison

Here’s a comparison of these models with the same prompt and seed. All but Anything v3 generate realistic images but with different aesthetics.

Compare commonly used models.

Images generated with the same seed and steps.

Other models

There are hundreds of Stable Diffusion models available. Many of them are special-purpose models designed to generate a particular style. Some notable ones are:

- waifu-diffusion: Japanese anime style

- Arcane-Diffusion: TV show Arcane style

- robo-diffusion: Robot images

- mo-di-diffusion: Modern Disney style

You can find more models here.

How to install and use a model

To install a model in AUTOMATIC1111 GUI, download and place the checkpoint (.ckpt) file in the following folder

stable-diffusion-webui/models/Stable-diffusion/

Press reload button next to the checkpoint drop box

You should see the checkpoint file you just put in available for selection. Select the new checkpoint file to use the model.

If you are new to AUTOMATIC1111 GUI, some models are preloaded in Colab notebook included in the Quick Start Guide.

Merging two models

Settings for merging two models.

To merge two models using AUTOMATIC1111 GUI, go to the Checkpoint Merger tab and select the two models you want to merge in Primary model (A) and Secondary model (B).

Adjust the multiplier (M) to adjust the relative weight of the two models. Setting it to 0.5 would merge the two models with equal importance.

After pressing Run, the new merged model will be available for use.

Example of a merged model

Here are sample images from merging F222 and Anything V3 with equal weight (0.5):

comparison between merged models

Compare F222, Anything V3 and Merged (50% each)

The merged model sits in between the realistic F222 and the anime Anything V3 styles. It is a very good model for generating illustration arts with human figures.

Summary

In this article, I have introduced what Stable Diffusion models are, how they are made, a few common ones, and how to merge them. Using models can make your life easier when you have a specific style in mind.

This is part 3 of the beginner’s guide series.

Read part 1: Absolute beginner’s guide.

Read part 2: Inpainting.

--

--

Andrew

I write about AI and internet business. Check out my new stable diffusion site: https://stable-diffusion-art.com/