On Architectural Compression of Text-to-Image Diffusion Models
Paper: https://arxiv.org/abs/2305.15798
Unofficial implementation: https://github.com/segmind/distill-sd
Text-to-image generation is a fascinating and challenging task that aims to create realistic images from natural language descriptions. Recently, Stable Diffusion have achieved impressive results in this task, but they also come with high computational costs. In this blog post, I will introduce a new paper that proposes a novel method to compress the architecture of Stable Diffusion and make them more efficient without sacrificing quality.
The proposed approach
- The paper proposes a novel method to compress the architecture of Stable Diffusion models (SDMs) by removing some residual and attention blocks from the U-Net network that performs diffusion in the latent space.
- The paper uses knowledge distillation to transfer the knowledge from the original SDM to the compressed one, using only a small fraction of the training data.
- The paper shows that the compressed models, called BK-SDMs, can achieve over 51% reduction in the number of parameters, and 43% improvement in latency on CPU and GPU compared to SDMs, while maintaining competitive results.
- The paper also demonstrates the applicability of BK-SDMs in personalized generation with DreamBooth fine-tuning.
- Unofficial implementation open-sourced model weights and training codes for two types of architectures: SD Small and SD Tiny.
Train Distill SD with DiffEngine
DiffEngine GitHub: https://github.com/okotaku/diffengine
DiffEngine documentation: https://diffengine.readthedocs.io/en/latest/
In this section, we will share how we implemented Disill SD for the SDXL model.
To apply Distill SD to SDXL, we did the following steps:
- We ignored the 4th down/up blocks deletion, because SDXL does not have these blocks. This is different from the original paper, where they applied this deletion to Stable Diffusion Models.
- We removed one Attention layer from each U-Net block, except for the first block, which doesn’t have an Attention layer. We also adjusted the distillation operation based on this modification.
- We deleted one Residual Block from each U-Net block. This is consistent with the unofficial implementation.
- We deleted the middle blocks for Tiny SDXL, which are the blocks between the encoder and decoder of the U-Net. This is also consistent with the unofficial implementation.
By doing these modifications, we were able to obtain a smaller and faster version of SDXL, which we call Distill SDXL.
You can check our implementations.: https://github.com/okotaku/diffengine/blob/main/diffengine/models/editors/distill_sd/distill_sd_xl.py
Installation
Before installing DiffEngine, please ensure that PyTorch has been successfully installed following the official guide.
https://pytorch.org/get-started/locally/
Install DiffEngine
pip install openmim
pip install git+https://github.com/okotaku/diffengine.git
Train Distill SDXL with DiffEngine
A variety of pre-defined configs can be found in the configs directory of the DiffEngine repository.
Distill SDXL Configs: https://github.com/okotaku/diffengine/tree/main/configs/distill_sd
For example, if you wish to train a Tiny SDXL model with the pokemon blip dataset, access the file https://github.com/okotaku/diffengine/blob/main/configs/distill_sd/tiny_sd_xl_pokemon_blip.py.
To train with a selected config, open a terminal and run the following command:
mim train diffengine tiny_sd_xl_pokemon_blip.py
Inference Distill SDXL with diffusers.pipeline
I have uploaded the trained model weights to the Hugging Face Hub. You can utilize it for a inference.
Trained model weight: https://huggingface.co/takuoko/tiny_sd_xl_pokemon_blip
import torch
from diffusers import DiffusionPipeline, UNet2DConditionModel, AutoencoderKL
checkpoint = 'takuoko/tiny_sd_xl_pokemon_blip'
prompt = 'a very cute looking pokemon with a hat on its head'
unet = UNet2DConditionModel.from_pretrained(
checkpoint, torch_dtype=torch.bfloat16
)
vae = AutoencoderKL.from_pretrained(
'madebyollin/sdxl-vae-fp16-fix',
torch_dtype=torch.bfloat16,
)
pipe = DiffusionPipeline.from_pretrained(
'stabilityai/stable-diffusion-xl-base-1.0', unet=unet, vae=vae, torch_dtype=torch.bfloat16
)
pipe.to('cuda')
image = pipe(
prompt,
num_inference_steps=50,
).images[0]
image.save('demo.png')
An illustrative output example is provided below:
Train Distill SD DreamBooth with DiffEngine
In this section, we showed the results of personalized generation by fine-tuning DreamBooth using Tiny SD.
Distill SD DreamBooth Configs: https://github.com/okotaku/diffengine/tree/main/configs/distill_sd_dreambooth
Tiny SD Checkpoints: https://huggingface.co/segmind/tiny-sd
To train with https://github.com/okotaku/diffengine/blob/main/configs/distill_sd_dreambooth/small_sd_dreambooth_lora_dog.py, open a terminal and run the following command:
mim train diffengine small_sd_dreambooth_lora_dog.py
Tiny SD reduces training time by nearly 30%.
Inference Distill SD DreamBooth with diffusers.pipeline
Once you have trained a model, simply specify the path to the saved model and inference by the diffusers.pipeline module.
I have uploaded the trained model weights to the Hugging Face Hub. You can utilize it for a inference.
Trained model weight: https://huggingface.co/takuoko/small-sd-dreambooth-lora-dog
import torch
from diffusers import DiffusionPipeline
checkpoint = 'takuoko/small-sd-dreambooth-lora-dog'
prompt = 'A photo of sks dog in a bucket'
pipe = DiffusionPipeline.from_pretrained(
'segmind/small-sd', torch_dtype=torch.float16)
pipe.to('cuda')
pipe.load_lora_weights(checkpoint, weight_name='pytorch_lora_weights.bin')
image = pipe(
prompt,
num_inference_steps=50,
).images[0]
image.save('demo.png')
An illustrative output example is provided below:
Conclusion
DiffEngine supports Distill SD trainings. Let’s take a look it;)
Thank you for reading.
Sponsors
I am a member of Z by HP Data Science Global Ambassadors. Special Thanks to Z by HP for sponsoring me a Z8G4 Workstation with dual A6000 GPU and a ZBook with RTX5000 GPU.
Reference
- Open-sourcing Knowledge Distillation Code and Weights of SD-Small and SD-Tiny: https://huggingface.co/blog/sd_distillation
- Segmind: https://www.segmind.com/models
- SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis: https://arxiv.org/abs/2307.01952