Create Chatbot Text to Image using Stable Diffusion and Streamlit

5 min readDec 18, 2023

You can create images from text input in a few different ways, but have you tried Stable Diffusion?

Text to Image

Text-to-image technology represents a groundbreaking advancement in the field of artificial intelligence, where complex algorithms can transform written descriptions into vivid, visual representations. This innovative process leverages advanced machine learning techniques, particularly generative adversarial networks (GANs), to interpret and visualize text inputs with remarkable accuracy and creativity. The implications of this technology are vast, extending from enhancing creative endeavors in art and design to providing visual aids in education and research. Text-to-image models can generate anything from realistic scenes to fantastical imagery, offering a limitless canvas for imagination and a powerful tool for visual communication in our increasingly digital world.

You can try many free Text to Image generator in the internet such us DeepAI, Wepik, Pixlr, Canva, etc. But I strongly suggest to you to try Stable Diffusion because it was Open Source AI Platform and we can create descriptive images with shorter prompts and generate words within images.

Stable Diffusion (SD)

As I mentioned above, this technology is free and we can contribute to do fine-tuning the model. But the main objective of Stable Diffusion model is creating significant advancement in image generation capabilities, offering enhanced image composition and face generation that results in stunning visuals and realistic aesthetics. Stable Diffusion has multiple model type in example Audio, Image, Video, and Language. Currently, we are discussing about Image model and we can explore more the feature in this article.

SD basic model is just creating images from text but if we dig further in their website, it has another capability outside text to image:
1. inpainting — Edit inside the image
2. Outpainting — Extend the image outside of the original image
3. Image to Image — Prompt a new image using a sourced image

After we checked in the documentation, these 3 capabilities only available when we use SDXL Turbo model. But SDXL Turbo endpoint is not available in Hugging Face, so we can skip this SDXL Turbo model and use the base model instead (Of course, I will share on how to install and use SDXL Turbo locally in another article ^_^).

SD Endpoint

We are not using SDXL Turbo locally, so we will use free basic SD model from hugging face endpoint. All you need to do is register your account in hugging first then create the endpoint.

Navigate to your hf setting page and create access token:

Once it done, go to stable diffusion base 1.0 model page. Click deploy button and copy the Inference Endpoint.
We can use this example on how to send the prompt and get the image result.

import requests

API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-base-1.0"
headers = {"Authorization": "Bearer hf_xxxxxxxxxxxxxxxxxxxxx"}

def query(payload):
 response = requests.post(API_URL, headers=headers, json=payload)
 return response.content

Do not forget to change the API token in headers variable otherwise you will get an error when running the prompt query.

Streamlit

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science. In just a few minutes you can build and deploy powerful data apps.

Streamlit Installation

Navigate to your Python environment and run this command:

pip install streamlit

Once the installation completed, you can move to development app below.

Development

We have everything in our hands from endpoint and package, let’s start importing all important packages in our code now:

import streamlit as st
import requests
import io
from PIL import Image

We need PIL package because SD endpoint will return a data type bytes and we should save it into image data type.

def query_stabilitydiff(payload, headers):
    API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-xl-base-1.0"
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.content

We change the prompt query to be more flexible by adding parameter payload and headers in our function.

Chatbot Title

With the help of Streamlit, we can easy to build the UI using Python syntax. In here, we create the app title and sidebar repo:

with st.sidebar:
    "[View the source code](https://github.com/dmitrimahayana/Py-LangChain-ChatGPT-VirtualAssistance/blob/main/03_Streamlit_Stable_Diff.py)"

st.title("💬 Chatbot - Text to Image")
st.caption("🚀 A Streamlit chatbot powered by Stable Diffusion")

Assistant Message and Session

Let’s define the first chatbot message and save it into Streamlit session data:

if "messages" not in st.session_state:
    st.session_state["messages"] = [
        {"role": "assistant", "content": "What kind of image that I need to draw? (example: running cat)"}]

Show Previous Prompts and Results

To display our conversation message, we need to iterate the content from Streamlit session data and check if it has image key or not:

for message in st.session_state.messages:
    st.chat_message(message["role"]).write(message["content"])
    if "image" in message:
        st.chat_message("assistant").image(message["image"], caption=message["prompt"], use_column_width=True)

Prompt Logic

Now, let’s add a logic how it can retrieve the query from prompt and show the result in Streamlit app:

if prompt := st.chat_input():

    if not st.secrets.hugging_face_token.api_key:
        st.info("Please add your Hugging Face Token to continue.")
        st.stop()

    # Input prompt
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)

    # Query Stable Diffusion
    headers = {"Authorization": f"Bearer {st.secrets.hugging_face_token.api_key}"}
    image_bytes = query_stabilitydiff({
        "inputs": prompt,
    }, headers)

    # Return Image
    image = Image.open(io.BytesIO(image_bytes))
    msg = f'here is your image related to "{prompt}"'

    # Show Result
    st.session_state.messages.append({"role": "assistant", "content": msg, "prompt": prompt, "image": image})
    st.chat_message("assistant").write(msg)
    st.chat_message("assistant").image(image, caption=prompt, use_column_width=True)

First, we check if it has new input prompt or not. Then, we save the input to new session and show it in the Streamlit UI. Second, we send the prompt to HF endpoint and retrieve the result. We convert the result from bytes into image type. Third, we save the result to Streamlit new session and show it to the UI.

Run Application

Since we are using Streamlit to display the application, we need to run the script directly under Streamlit command. You can follow this template to run in the cmd/terminal:

streamlit run your_script.py

If there is no error in the run, it will show this in terminal:

(Py-LangChain-ChatGPT-VirtualAssistance) PS D:\00 Project\00 My Project\IdeaProjects\Py-LangChain-ChatGPT-VirtualAssistance> streamlit run .\03_Streamlit_Stable_Diff.py

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.1.5:8501

Normally, there will be a new pop browser that navigate to “http://localhost:[streamlit_port]” in our computer:

Let’s test our text to image app by inserting new prompt “a cinematic cat with a tuxedo and black hat”.

Pretty cool right? Now we have so many idea that we can explore using this Stability Diffusion and Streamlit. I will share the Image to Image editing later in other article.

If you like this article, you can follow my profile and I will send you notification once I have the new update.
Thank you.

GitHub Repository

https://github.com/dmitrimahayana/Py-LangChain-ChatGPT-VirtualAssistance/blob/main/03_Streamlit_Stable_Diff.py