SSD-1B: A Leap in Efficient T2I Generation

4 min readOct 25, 2023

SSD-1B: A Leap in Efficient T2I Generation

blog post: https://blog.segmind.com/introducing-segmind-ssd-1b/

github: https://github.com/segmind/SSD-1B

I previously authored a blog post on ‘On Architectural Compression of Text-to-Image Diffusion Models’. In this follow-up post, I will introduce a novel approach to compress the architecture of Stable Diffusion XL, enhancing their efficiency and maintaining high quality.

On Architectural Compression of Text-to-Image Diffusion Models

On Architectural Compression of Text-to-Image Diffusion Modelsmedium.com

The proposed approach

Segmind has developed SSD-1B, a model that is 50% smaller and 60% faster compared to the SDXL 1.0 model.
To achieve this, they removed 40 transformer blocks and 1 ResNet block from SDXL.
Similar to their previous work, they employed knowledge distillation to transfer knowledge effectively.
To enhance its ability to generate a wide range of visual content, they trained SSD-1B with approximately 15 million image-prompt pairs.

The outcome is impressive: SSD-1B’s image outputs frequently match or even surpass the quality of the base SDXL model.

Finetune SSD-1B DreamBooth with DiffEngine

In this section, we present the results of personalized generation achieved by fine-tuning DreamBooth using SSD-1B. DiffEngine simplifies the training process, makes it available to everyone.

PR about SSD-1B: https://github.com/okotaku/diffengine/pull/83

pip install openmim
pip install git+https://github.com/okotaku/diffengine.git
mim train diffengine ssd_1b_dreambooth_lora_dog.py

One of the standout advantages of utilizing SSD-1B is the significant reduction in training time. It helps cut down training times by a substantial 35%, making the entire process more efficient.

Inference SSD-1B DreamBooth with diffusers.pipeline

Once we have trained a model, specify the path to where the model is saved, and use it for inference with the diffusers.

import torch
from diffusers import DiffusionPipeline

checkpoint = 'work_dirs/ssd_1b_dreambooth_lora_dog/step499'
prompt = 'A photo of sks dog in a bucket'

pipe = DiffusionPipeline.from_pretrained(
    'segmind/SSD-1B', torch_dtype=torch.float16)
pipe.to('cuda')
pipe.load_lora_weights(checkpoint)

image = pipe(
    prompt,
    num_inference_steps=50,
).images[0]
image.save('demo.png')

We’ve provided an illustrative example below. It showcases the promising results we’ve achieved:

These outputs exemplify the exciting potential and quality that SSD-1B is delivering. Our journey into personalized generation and architectural compression is indeed a promising one.

Train Distill SSD-1B with DiffEngine

In this section, we showed the results of distillation training, where we explored two distinct settings.

Setting 1: Teacher model -SDXL, Student Model -SSD-1B loaded from SDXL

In our initial approach, we leveraged the strength of SDXL as the teacher model, transferring its extensive knowledge to the student model.

mim train diffengine ssd_1b_distill_from_sdxl_pokemon_blip.py

Setting 2: Teacher model -SDXL, Student Model -SSD-1B Pretrained

In the second approach, we took the pretrained SSD-1B and fine-tuned it with additional datasets. This process aimed to enhance its capacity to generate content in a variety of domains beyond its original training data.

mim train diffengine ssd_1b_distill_pokemon_blip.py

These distinct approaches allowed us to fine-tune SSD-1B, enhancing its capabilities and paving the way for exciting developments in the world of AI-driven content generation. Stay tuned for more insights and results from our distillation journey.

Inference Distill SSD-1B with diffusers.pipeline

import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, AutoencoderKL

checkpoint = 'work_dirs/ssd_1b_distill_from_sdxl_pokemon_blip/unet'
prompt = 'yoda pokemon'

unet = UNet2DConditionModel.from_pretrained(
    checkpoint, torch_dtype=torch.bfloat16
    )
vae = AutoencoderKL.from_pretrained(
    'madebyollin/sdxl-vae-fp16-fix',
    torch_dtype=torch.bfloat16,
)
pipe = DiffusionPipeline.from_pretrained(
    'segmind/SSD-1B', unet=unet, vae=vae, torch_dtype=torch.bfloat16
    )
pipe.to('cuda')

image = pipe(
    prompt,
    num_inference_steps=50,
).images[0]
image.save('demo.png')

Now, let's delve into the exciting realm of inference using Distill SSD-1B in combination with the powerful Diffusers' pipeline.

We’ve included an illustrative output example below:

Left: Setting 1 SSD-1B loaded from SDXL Right: Setting2 SSD-1B Pretrained

These results truly speak to the remarkable capabilities of SSD-1B. The content generated is not only of high quality but also contextually relevant. Our journey into distillation and architecture compression is continually pushing boundaries, and the outcomes are indeed impressive. Stay tuned for more exciting developments!

Conclusion

DiffEngine supports SSD-1B trainings. Let’s take a look it;)

Thank you for reading.

Reference

Segmind Stable Diffusion 1B (SSD-1B) Model Card: https://huggingface.co/segmind/SSD-1B

SSD-1B: A Leap in Efficient T2I Generation

SSD-1B: A Leap in Efficient T2I Generation

On Architectural Compression of Text-to-Image Diffusion Models

On Architectural Compression of Text-to-Image Diffusion Models

The proposed approach

Finetune SSD-1B DreamBooth with DiffEngine

Inference SSD-1B DreamBooth with diffusers.pipeline

Train Distill SSD-1B with DiffEngine

Setting 1: Teacher model -SDXL, Student Model -SSD-1B loaded from SDXL

Setting 2: Teacher model -SDXL, Student Model -SSD-1B Pretrained

Inference Distill SSD-1B with diffusers.pipeline

Conclusion

Reference

Written by takuoko