[ML Story] DreamBoothing Your Way into Greatness

Sayak Paul
Google Developer Experts
4 min readMar 22, 2023

--

Image from https://unsplash.com/ko/%EC%82%AC%EC%A7%84/Zwvxj3ytTHc

With the advent of text-to-image generation models (like DALL-E 2, Stable Diffusion, Imagen, and Parti), the definition of “what’s possible” has definitely gotten a new edge. Even though these models often come with impressive capabilities across different facets of creativity, they often lack subject-specific personalization.

Consider the following image as an example:

With the existing systems, it’s challenging to generate the same subject in different contexts while maintaining fidelity and fine-grain details. Here is an example (but for a different subject):

Image from https://dreambooth.github.io/

Even with expensive iterations of fine-tuning, these models fail to produce high-quality generations in targeted and personalized contexts.

Enter DreamBooth

Thankfully, there’s an (inexpensive) way to solve this problem — DreamBooth! DreamBooth was proposed by Ruiz et al. in DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation (CVPR 2023).

DreamBooth introduces a way to steer the generation of these models toward highly specific contexts that closely align with the given subject. Here is one such example:

Image from https://dreambooth.github.io/

To know more about DreamBooth, check out the original website. I also encourage you to check out the different use cases made possible by this powerful technique.

Open-sourced DreamBooth

DreamBooth doesn’t have any official public implementation. However, considering its effectiveness, the community soon started using the DreamBooth training technique with the large open-source text-to-image Diffusion model — Stable Diffusion.

Amongst the open-source implementations of DreamBooth, the one provided by the Hugging Face team is quite popular:

It supports easy customizability and various optimization techniques such as LoRA, CPU offloading, gradient checkpointing, and more. Soon after the Hugging Face released the training script, the community has been using it in creative ways. Consider the following personalization of Mr. Potato Head:

Prompt used to generate these images: “A photo of sks mr potato head in a river”

This inspired me to think about implementing a similar script in TensorFlow using KerasCV’s implementation of Stable Diffusion:

Implementation in TensorFlow

Besides being one of the few readable open-source implementations of DreamBooth, Hugging Face’s script is also tremendous as educational material.

Chansung Park and I collaborated on this project. We tried to follow the same design principles while implementing it in TensorFlow. After we felt confident about it, we open-sourced it:

Soon after the implementation was open-sourced, we also decided to blog about our learning journey and published it:

Since diffusers provides different tools to optimize different aspects of the generation process further, we also worked on a tool that lets you convert the KerasCV Stable Diffusion checkpoints to a format that is compatible with the StableDiffusionPipeline provided by diffusers . Know more about it here:

We ran many experiments and included all their details in our GitHub repository. We encourage you to check those out.

Keras DreamBooth Hackathon

My teammates at Hugging Face were happy to see this implementation, and we decided to collaborate with Google to host a community sprint dedicated to it:

We kicked off the event by hosting talks from Nataniel Ruiz (first author of DreamBooth), Francois Chollet from Google, Apolinario Passos, and Nima Boscarino from Hugging Face. You can find the entire broadcast below:

There’s still time, and we encourage you to join the sprint!

--

--