Hello World!
This is my first blog post on Medium. Allow me to introduce myself. I’m takuoko, work as a computer vision engineer. I plan to continue posting about AI technologies, including computer vision, in the future. Thank you for joining.
# Introduction
DiffEngine: A toolbox for training state-of-the-art Diffusion Models with Diffusers and MMEngine.
I have released DiffEngine, an open-source toolbox designed to simplify the training state-of-the-art diffusion models. Diffusion models are a class of generative models that can synthesize realistic images, text, audio, and video from random noise.
This image comes from https://stability.ai/stablediffusion.
With the help of diffusers and MMEngine, DiffEngine enable you to customize and optimize your diffusion models. Whether you are a beginner or an expert in the field, DiffEngine can help you create and improve your diffusion models with a simple and unified interface.
DiffEngine GitHub: https://github.com/okotaku/diffengine
DiffEngine documentation: https://diffengine.readthedocs.io/en/latest/
# About MMEngine
MMEngine is a powerful and versatile library for training deep learning models based on PyTorch. It is the core engine of all OpenMMLab projects, which cover a wide range of research areas.
Empowered by MMEngine, you can easily train state-of-the-art models with minimal code, customize and optimize your models with advanced features, and evaluate and visualize your results with various tools.
If you want to learn more about MMEngine, you can visit GitHub.
MMEngine GitHub: https://github.com/open-mmlab/mmengine
OpenMMLab: https://openmmlab.com/
# About Diffusers
Diffusers is a library that allows you to generate realistic and diverse samples of images, audio, and other types of data using diffusion models. Diffusers provides:
- Diffusion pipelines that can be run in inference with just a few lines of code.
- Various pretrained models.
If you want to learn more about Diffusers, you can visit GitHub.
Diffusers GitHub: https://github.com/huggingface/diffusers
Huggingface: https://huggingface.co/huggingface
# DiffEngine Features
## Training state-of-the-art diffusion models
DiffEngine supports state-of-the-art diffusion models that have achieved impressive results. You can use Stable Diffusion, Stable Diffusion XL, DreamBooth, LoRA, and more to generate high-quality images. You can also use ControlNet to train conditional diffusion models that can generate images based on text prompts or other images.
## Unified config system and module designs
DiffEngine leverages MMEngine, a powerful framework that provides a unified configuration system and modular designs for your projects. You can easily adjust the hyperparameters, loss functions, data sets, and other settings of your diffusion models using pre-defined configs or creating your own. You can also reuse and combine different modules to create complex and flexible models.
## Inference with diffusers.pipeline
DiffEngine simplifies deploying trained diffusion models for inference tasks with the help of the diffusers.pipeline module. You can simply load your model and use it to generate samples from given conditions.
# DiffEngine Usage
## Installation
Before installing DiffEngine, please ensure that PyTorch has been successfully installed following the official guide.
https://pytorch.org/get-started/locally/
Install DiffEngine
pip install openmim
pip install git+https://github.com/okotaku/diffengine.git
## Training with pre-defined config
A variety of pre-defined configs can be found in the configs directory of the DiffEngine repository. For example, if you wish to train a DreamBooth model using the Stable Diffusion algorithm, access the file configs/stable_diffusion_dreambooth/stable_diffusion_v15_dreambooth_lora_dog.py.
To train with a selected config, open a terminal and run the following command:
mim train diffengine stable_diffusion_v15_dreambooth_lora_dog.py
## Monitor Progress and get results
The training process will begin, and you can track its progress. The outputs of the training will be located in the work_dirs/stable_diffusion_v15_dreambooth_lora_dog directory, specifically when using the stable_diffusion_v15_dreambooth_lora_dog config.
work_dirs/stable_diffusion_v15_dreambooth_lora_dog
├── 20230802_033741
| ├── 20230802_033741.log # log file
| └── vis_data
| ├── 20230802_033741.json # log json file
| ├── config.py # config file for each experiment
| └── vis_image # visualized image from each step
├── step999
| └── pytorch_lora_weights.bin # weight for inferencing with diffusers.pipeline
├── iter_1000.pth # checkpoint from each step
├── last_checkpoint # last checkpoint, it can be used for resuming
└── stable_diffusion_v15_dreambooth_lora_dog.py # latest config file
An illustrative output example is provided below:
## Inference with diffusers.pipeline
Once you have trained a model, simply specify the path to the saved model and inference by the diffusers.pipeline module.
import torch
from diffusers import DiffusionPipeline
checkpoint = 'work_dirs/stable_diffusion_v15_dreambooth_lora_dog/step999'
prompt = 'A photo of sks dog in a bucket'
pipe = DiffusionPipeline.from_pretrained(
'runwayml/stable-diffusion-v1–5', torch_dtype=torch.float16)
pipe.to('cuda')
pipe.load_lora_weights(checkpoint)
image = pipe(
prompt,
num_inference_steps=50,
).images[0]
image.save('demo.png')
# Example Notebook
For a more hands-on introduction to DiffEngine, you can run the Example Notebook on Colaboratory. This notebook demonstrates the process of training using SDV1.5 and SDV2.1 DreamBooth configurations.
# DiffEngine Vision
Diffusers provides various state-of-the-art diffusion models, while MMEngine offers great unified config system and modular designs. By using both advantages, DiffEngine will create a unified training library for Diffusers.
Our roadmap includes expanding support for additional training methods such as:
- Distill SD
- Instruct Pix2Pix
- ControlNet Small
- T2I Adapter
- IP Adapter
Furthermore, we aim to develop LoRAs.
We are looking for core developers who are interested in these advancements.
My X link: https://twitter.com/takuoko1
Thank you for reading.
# Sponsors
I am a member of Z by HP Data Science Global Ambassadors. Special Thanks to Z by HP for sponsoring me a Z8G4 Workstation with dual A6000 GPU and a ZBook with RTX5000 GPU.
# Reference
MMEngine GitHub: https://github.com/open-mmlab/mmengine
OpenMMLab: https://openmmlab.com/
Diffusers GitHub: https://github.com/huggingface/diffusers
Huggingface: https://huggingface.co/huggingface
Stable Diffusion XL: https://stability.ai/stablediffusion
stability.AI: https://stability.ai/
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation: https://dreambooth.github.io/
Diffusers Documentation about Training DreamBooth: https://huggingface.co/docs/diffusers/training/dreambooth