Control your Stable Diffusion Model with ControlNet and OpenVINO

Paula Ramos
OpenVINO-toolkit
Published in
6 min readMar 21, 2023

We’ll explore how to control your stable diffusion model using ControlNet and OpenVINO. As we know, stable diffusion models are used to simulate various natural phenomena, but the inherent random noise can make it challenging to achieve accurate results. However, with ControlNet and OpenVINO, you can take control of your models and produce reliable predictions. Join me as we explore the world of stable diffusion modeling and discover how these powerful tools can help us achieve breakthroughs in our research and analysis.

Figure 1. Image generated with OpenVINO Stable Diffusion Notebook, prompt: “cyberpunk human controlling a big machine in a dark day” 42 seed, 20 steps

The past few months, and perhaps the past year, have seen exponential growth of generative AI models. One of the most impactful developments is related to the “Stable Diffusion” ecosystem.

Random noise is an inherent characteristic of stable diffusion models, making achieving accurate and reliable results difficult. But fear not, my reader! You can use various techniques to gain control and master your AI models. So, let me show you how to take control of your AI models and unleash their full potential. You’ll be amazed at the breakthroughs you’ll achieve!”

How to control AI?

It is possible to personalize the stable diffusion model through fine-tuning to find your other self in parallel worlds 😊. With Stable Diffusion, acquiring and applying an artistic style to different input images is also possible. There are different techniques to control your Diffusion Model:

Dreambooth: Dreambooth is a technique for teaching new concepts to Stable Diffusion using a specialized form of fine-tuning. Some people have used it with their photos to place themselves in extraordinary situations, while others use it to incorporate new styles. https://arxiv.org/abs/2208.12242 https://dreambooth.github.io/

Figure 2. Dreambooth explainable image from Google Research paper.

LoRA: Low-Rank Adaptation of Large Language Models is a technique introduced by Microsoft researchers to address the problem of fine-tuning large language models. LoRA proposes to freeze the weights of the pre-trained model and inject trainable layers (low-rank decomposition matrices) into each transformer block. Researchers found that by focusing on the transformer attention blocks of large language models, the quality of fine-tuning with LoRA was on par with fine-tuning the entire model while being much faster and requiring less computation. https://arxiv.org/abs/2106.09685

Attend-and-Excite: The text-to-image generative models have demonstrated an excellent capacity for generating diverse and creative images guided by an objective text. However, current diffusion models can fail to fully convey the semantics of the text in the generated images. Attend-and-Excite proposes visualizing attention maps that relate each word to a region in the image, allowing the activation map regions to be optimized to ensure that critical information from the prompt is captured in the final image. Results are promising, as the prompt generates images with all the requested elements and compositions that are more coherent and less amorphous mixtures of different elements.

Figure 3. Attend-and-Excite explainable image from Arxiv Paper.

Other forms of control with input images: Originally, stable diffusion allowed for image creation using text and images as input. With the im2img technique, the colors and compositions of an input image were used to create something similar, providing a certain level of control. depth2img is a stable diffusion technique that uses depth maps instead of pixel color information, allowing greater control over the generated image.

Figure 4. Depth2Img example. Source Hugging Face. Paper

ControlNet:

Img2Img and Depth2Img were just one step. ControlNet has many more possibilities that allow us to control stable diffusion using object borders, lines, scribbles, pose skeletons, segmentation maps, depth maps, and more.

How it works in short words:

ControlNet is a project based on the idea of hyper networks. It takes a neural network-like stable diffusion and clones it. The original model is frozen, and the copy is trained with the possibility of having a new input, like a depth image, which will modify and condition the result of the original neural network. We have a new neural network coupled to the original network, allowing us to control the main network. ControNet can take an existing stable diffusion model, make slight changes to the architecture, and add whatever you want, adding extra conditions.

The ControlNet workflow using OpenPose is shown below. Key points are extracted from the input image using OpenPose and saved as a control map containing the positions of the key points. This is then introduced into Stable Diffusion as additional conditioning along with the text prompt. Images are generated based on these two conditions.

In the image below, you will see me posing to the camera and the pose estimation. With that information and three different prompts, I ran the native model and got the top row, and using the Optimized OpenVINO model, I got the bottom row.

Figure 5. ControlNet is working with Pose Estimation model. Three different prompts. Native model and Optimized OpenVINO model.

ControlNet with OpenVINO:

ControlNet has become the missing fundamental piece to Stable Diffusion, allowing for absolute control over the final image to be generated. Now you could run these models more efficiently using OpenVINO. Fork our repository and try the new notebook, where we create control through pose estimation.

You can reuse this OpenVINO notebook to apply different ways to control the Diffusion Model. To change the choice model, you should modify the contronet variable using this information, https://huggingface.co/lllyasviel

Figure 6. Control options for ControlNet.

Thanks to the notebook’s author Ekaterina Aidova for her outstanding contributions to the OpenVINO Notebooks Repository.

Imagine now digital creators generating new content from original images, possibly reducing the budget required for new movies or 3D animations. Turning reality into a cartoon. This is happening in the hands of @Corridor Crew. Check out their video here: https://www.youtube.com/watch?v=_9LX9HSQkWo&t=10s

Conclusion

This blog could be very helpful for those who want to learn, play, and innovate with AI. The benefit of AI is within everyone’s reach and can help us to improve our lives. Taking an active role in the development of AI will help ensure that it benefits society.

My invitation to you is to try the OpenVINO notebooks. Stay tuned because more related content will be coming to my channels. If you have any issues, please add your questions in the discussion section of our repository https://github.com/openvinotoolkit/openvino_notebooks/discussions

Enjoy the blog and enjoy the notebooks! 😊

#iamintel #openvino #generativeai #xai #explainableia #controlnet #openvinonotebooks

About me:

Hi, all! My name is Paula Ramos. I have been an AI enthusiast and have worked with Computer Vision since the early 2000s. Developing novel integrated engineering technologies is my passion. I love to deploy solutions that real people can use to solve their equally real problems. If you’d like to share your ideas on how we could improve our community content, drop me a line! 😉 I will be happy to hear your feedback.

Here is my LinkedIn profile: https://www.linkedin.com/in/paula-ramos-41097319/

Notices & Disclaimers: Intel technologies may require enabled hardware, software, or service activation. No product or component can be absolutely secure. Your costs and results may vary. Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

--

--