How to Build Your First GenAI App with Stable Diffusion and Control Nets

7 min readJan 6, 2024

Want to build your first GenAI app & don’t know where to start?

Then this blog is just for you.

In this blog, we are going to build ‘Logo Avatars’. This will be an app that converts boring logos into creative ones. We will build the proof of concept on Google Colab, leveraging the powerful combination of Stable Diffusion and ControlNet. We will be building this app in python.

No, we are not using any API, we will be building it from scratch using diffusers. 😎

TLDR: Notebook Link

Introduction to Stable Diffusion, ControlNet, and the Diffusers Library

Welcome to the exciting world of AI-driven image generation! Our journey begins with understanding the core concepts behind stable diffusion and ControlNet, utilizing the powerful diffusers library.

The Basics of Stable Diffusion:

It’s a Text to image deep learning model.

Stable diffusion is a groundbreaking technology that translates textual prompts into stunning visual imagery. This AI model goes beyond simple text-to-image conversions, offering capabilities like image-to-image transformations and adjustable parameters, like guidance scale, width, and height, to fine-tune the results.

Hugging Face and Pipelines with diffusers.

At the heart of our project is Hugging Face, a platform that hosts various pre-trained models. By specifying a model’s path, we can effortlessly download and integrate it into our application. These models work within the concept of pipelines — specialized workflows for different tasks such as basic text-to-image, image-to-image, or more complex ones involving ControlNet.

Diffusers — It’s a python library by hugging face through which we can run these models.

# Eg of Stable diffusion 1.5 & canny control net, we will dive deep soon.

# load pretrained control net and stable diffusion v1-5

# we are providing hugging face model paths here to download them 
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
     "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16
)

ControlNet: A Guide to Enhanced Image Generation

ControlNet introduces a layer of guidance to the image generation process. It requires a control image, which then directs the AI in creating the final output.

For instance, the Canny Edge Detections focuses on edges by employing an edge detection algorithm.

or openpose generates pose image.

Normal Image -> Hinter Algorithm -> Control image

Some Parameters of Stable Diffusion Pipelines You Should Know About

“width” and “height”: These manage the resolution of the generation.
“guidance_scale”: Prompt guidance dictates how closely the generation follows a given prompt. The default value is 7, with a recommended value range of 4 to 9.
“image” (control_image): This is used to guide or control the generation based on another reference image.
“controlnet_conditing_scale”: This influences the strength of the guidance image, with a recommended value range of 0.0 to 1.5.
“num_inference_steps”: The higher the value, the better quality we get, with a recommended value range of 20 to 50.

Building the Proof of Concept (POC) in Google Colab

Why Google Colab?

Google Colab offers free T4 GPU, making it an ideal environment for developing and testing AI applications.

Image Generation with Control Nets:

To accomplish this, we require two key components:

Control image — A control image is generated from a normal image. For example, using Canny’s method, we’ll obtain an edge image like this.

2. Control Net Model: We will be using the ‘sd-controlnet-canny’ model. With the generated control image, this model helps to control the final image generation

ControlNet Hinter Library:

The ControlNet Hinter library provides us with functions to generate control images which we can use in our pipeline.

We use a hashmap to link the specific ControlNet model path and the hinter function needed to generate the control image.

Implementing the Code in Colab

Installing Necessary Libraries:

We install diffusers from github since we want the latest version.

!pip install --quiet --upgrade accelerate
!pip install git+https://github.com/huggingface/diffusers.git@main
!pip install controlnet_hinter==0.0.5

Importing and Setting Up ControlNet:

We create a hashmap for control nets, mapping each model name to its corresponding model path on Hugging Face. This is used while loading the control net and the hinter function to generate the control image.

import controlnet_hinter

# Mapping for ControlNet Model path and hinter function

CONTROLNET_MAPPING = {
    # Various mappings...
    "canny_edge": {
        "model_id": "lllyasviel/sd-controlnet-canny",
        "hinter": controlnet_hinter.hint_canny
    },
   "pose": {
        "model_id": "lllyasviel/sd-controlnet-openpose",
        "hinter": controlnet_hinter.hint_openpose
    },
    # Other mappings...
}

Architecture of our pipeline:

In our case, we will be generating a control image based on Canny’s edge detection algorithm.

First generate the control image using controlnet_hinter.hint_canny function

We then pass this control image, along with other parameters such as prompt and guidance_scale, for inference to our Stable Diffusion ControlNet Pipeline.

We pass the control image to the control net pipeline with other params to get the final output.

Loading the Model and Pipeline in code:

Set Device to CUDA: To leverage GPU acceleration, we begin by specifying our device as CUDA. This ensures that all computations are efficiently handled by the GPU.
Import Key Components: Next, we import two essential components: StableDiffusionControlNetPipeline and ControlNetModel.
Load the Controlnet Model with a Hashmap: We then load the appropriate ControlNet model. This is done seamlessly using the hashmap we defined earlier.
Initialize the Stable Diffusion Pipeline: Lastly, we set up our Stable Diffusion Control Net Pipeline. Here, ‘Juggernaut_final’ is our chosen base model, into which we integrate the previously loaded ControlNet model.

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch

device = "cuda"
controlnet_type = "canny_edge"

# Stable diffusion base model
base_model_path =  "digiplay/Juggernaut_final"

# Loading the base model with ControlNet
controlnet = ControlNetModel.from_pretrained(CONTROLNET_MAPPING[controlnet_type]["model_id"], torch_dtype=torch.float16).to(device)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
     base_model_path ,
     controlnet=controlnet,
     torch_dtype=torch.float16
    ).to(device)p

Running the Inference

Define Parameters: First, set essential parameters such as the prompt, negative prompt, and guidance scale. These parameters guide the model’s output direction.
Load Logo Image: Next, load the logo image directly from a specified URL.
Convert to Control Image: Utilize the hashmap in combination to get the canny hinter function to transform the loaded image into a control image, ready for further processing.

Let’s try converting the Nike logo, as an example.

import torch
from diffusers.utils import load_image

# Set your prompt, negative prompt, steps, and guidance scale
prompt = "Colorful, jungle surrounding, trees, natural, detailed, hd, 4k"
negative_prompt = "lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality"

no_of_steps = 20

# how much would the prompt affect the final output.
# higher guidance scale means more preference given to the prompt.  
guidace_scale = 7.0

# how much final output would follow the control image 
controlnet_conditioning_scale=1.0

# load pil image in logo_image variable
logo_image = load_image("logo_image_url.jpg")

# convert it into a control image based on the model
control_image = CONTROLNET_MAPPING[controlnet_type]["hinter"](logo_image )

# run the inference,
my_images = pipe(
    prompt=prompt,
    width=512,
    height=512,
    negative_prompt=negative_prompt,
    image=control_image,
    controlnet_conditioning_scale=controlnet_conditioning_scale,
    num_inference_steps=no_of_steps,
    guidance_scale=guidace_scale,
)

# get first image from the image generations object. 
first_image = my_images.images[0]

# shows the generated control image
control_image

Displaying the Final Result:

first_image

Colorful, jungle surrounding, trees, natural, detailed, hd, 4k

Conclusion

Congratulations! Together, we’ve successfully built our first GenAI app powered by Stable Diffusion with control nets.

👉 Try it on Colab: Explore the Notebook
👉 Discover the FastAPI Backend App: Github Repo

If you want a guide on how to convert this colab into a full fledged app with backend, let me know.

I encourage you to take these concepts, experiment with them, and build something truly remarkable. If you found this guide helpful, consider starring the repository and sharing it with your friends. Let’s spread the word and inspire more creators in the world of GenAI!

Happy coding, and here’s to building incredible GenAI-powered applications!

🐦 Follow me on Twitter: AyushUnleashed

Acknowledgments

Yogendra Manawat ( SDE Intern @AiCaller.io ) : Big thanks to Yogendra for helping in perfecting the blog’s flow. Together, we read through the first version multiple times, fine-tuning the order and content to ensure it was just right for shipping.

Yatharth: Appreciation to Yatharth for his feedback on control nets explanation & iteration ideas.

Shashwat Verma ( CEO @ solevideos.ai ): Grateful to Shashwat for his personal time and insightful suggestions on improvements, offering a clear vision on what to do next.