Your Brand, Your Voice: Tailor-made Chatbot using your own data with OpenAI and LangChain

Swarup Tripathy
7 min readFeb 10, 2024
Photo by Mojahid Mottakin on Unsplash

In the evolving landscape of artificial intelligence and natural language processing, the ability to build a custom chatbot is an exciting venture. Elevate your Brand with a custom chatbot using your own data.

Building your own chatbot using OpenAI and LangChain offers a dynamic and customizable solution to meet your specific requirements. By understanding the capabilities of OpenAI’s language models and leveraging LangChain’s flexibility, you can create a chatbot that not only understands but also adapts to the nuances of your users’ interactions, providing a personalized and engaging experience.

As the field of AI continues to advance, this combination of technologies opens up new possibilities for intelligent and context-aware conversational agents.

In this article, we will go through step by step process of training OpenAI’s ChatGPT GPT-3.5using LangChain🦜🔗 to train our data and Streamlit to create a user interface for our conversational chatbot.

The Idea

Suppose we want to build a custom chatbot for a bookstore named "XYZ Bookstore". This bot should be able to answer all kind of customer queries on various books available in the store. For this we need to train ChatGPT with bookstore data to answer queries as ChatGPT doesn't know which books are sold by XZY Bookstore.

The Code

Now let’s get practical! We’ll develop our chatbot on our own data with very little Python code.

Setup Python

To download Python, head to the official Python website and download the latest version. Install Python version 3.7.1 or newer.

Let's build the project from scratch

Create a folder named “xyz_books” and inside that lets create a new python file app.py and requirements.txt file.

Add the following packages to requirements.txt file which we will need for our project.

openai==1.11.0
langchain==0.1.5
langchain-community==0.0.17
streamlit==1.31.0
streamlit_chat==0.1.1

Setup a virtual environment

Setting up a virtual Python environment is a good practice to manage dependencies and isolate project-specific libraries.

python -m venv xyzbooks-env

To activate the environment run the following code in unix or macOS terminal. Windows user can follow this link.

source xyzbooks-env/bin/activate

Then, we’ll install the necessary libraries (run:)

pip install -r requirements.txt

An API token is required to access OpenAI’s services

Setup the API key

The main advantage to making your API key accessible for all projects is that the Python library will automatically detect it and use it without having to write any code. We can add it to the bash profile or environment variable by running the following command in terminal.

export OPENAI_API_KEY='your-api-key-here'

Let's write our first set of code

Inapp.py" add the following code

import os
import sys
from openai import OpenAI


client = OpenAI()
prompt = sys.argv[1]

response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "user", "content": prompt}
]
)

generated_text = response.choices[0].message.content
print(generated_text)

Let's ask some questions to our chatbot

python app.py "what kind of books are available in your store?"

We can see that chatGPT is not able to answer this kind of queries as it is not trained for it. Let us take our own data and infuse it to chatGPT using LangChain

Diagram from LangChain on how to process custom data using vectorstore.

Create a data file

We have to first create a data file which should contain all the informations regarding XYZ Bookstore. We will name it as data.txt

You can copy the below content in your data file.

I am your personal assistant from XYZ Books. I can answer questions based on our store collections.

Answer questions based on the passage given below.

What genres of books do you offer?
We offer a wide range of genres, including fiction, non-fiction, mystery, science fiction, fantasy, thriller, and more.

Can you recommend a good book for children aged 8-10?
Certainly! For that age group, we recommend "The Chronicles of Narnia" series by C.S. Lewis or "Harry Potter" series by J.K. Rowling.

How can I place an order online?
You can place an order online through our website. Simply browse the catalog, add items to your cart, and follow the checkout process.

Are there any upcoming author signings or events at the bookstore?
Yes, we frequently host author signings and events. Check our events page on the website for the latest updates.

What's your return policy for books?
Our return policy allows for returns within 30 days of purchase with a valid receipt. Books must be in their original condition.

Do you offer e-books or audiobooks?
Yes, we offer a selection of e-books and audiobooks. You can find them in the digital section on our website.

Can you tell me more about the book club?
Certainly! Our book club meets monthly to discuss a chosen book. You can join by signing up on our website, and details for each month's book will be shared in advance.

What's your best-selling book currently?
Our current best-seller is "The Silent Patient" by Alex Michaelides. It has been receiving rave reviews.

Are there any discounts or promotions running currently?
Yes, we have a promotion offering 20% off on all hardcover fiction books until the end of this month. Don't miss out!

Can you recommend a classic novel for someone new to reading?
Absolutely! For someone new to reading, classics like "To Kill a Mockingbird" by Harper Lee or "Pride and Prejudice" by Jane Austen are timeless choices.

How do I sign up for the newsletter to receive updates?
You can sign up for our newsletter on the homepage of our website. Simply enter your email address, and you'll receive regular updates on new releases, promotions, and events.

Are there any gift cards available for purchase?
Yes, we offer gift cards in various denominations. They make for a perfect gift for book lovers and are available for purchase both in-store and online.


User: Hi
XYZBot: Hi, How can I help you ?

User: What is your name?
XYZBot: I am your personal assistant from XYZ Books. I can answer questions based on our store collections.

User: What genres of books do you offer
XYZBot: We offer a wide range of genres, including fiction, non-fiction, mystery, science fiction, fantasy, romance, and more.

Load custom data

We will now load the custom text data and convert it to vectors using LangChain and use it in GPT.

import os
import sys
from openai import OpenAI
from langchain_community.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
import warnings
warnings.filterwarnings("ignore")

client = OpenAI()
prompt = sys.argv[1]

loader = TextLoader("data.txt")
loader.load()
index = VectorstoreIndexCreator().from_loaders([loader])

print(index.query(prompt, retriever_kwargs={"search_kwargs": {"k": 1}}))

Response

We can see that chatGPT is now able to understand our query and giving the correct response.

Final Code:

Let us now put everything together and integrate the code with streamlit for viewing our app in a web browser. We are also going to store our prompts in sessions so that our model can remember our queries and respond properly.

Streamlit is an open-source Python library that allows you to create web applications for machine learning and data science projects quickly. We will use Streamlit-Chat which lets you insert a chat message container into the app so you can display messages from the user or the app. Chat containers can contain other Streamlit elements, including charts, tables, text, and more. To know more about streamlit you can click on this 👉 link.

import os
import sys
import streamlit as st
from streamlit_chat import message
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import ConversationalRetrievalChain
from langchain.indexes.vectorstore import VectorStoreIndexWrapper
from langchain_community.document_loaders import TextLoader
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.chat_models import ChatOpenAI
from langchain_community.vectorstores import Chroma

import warnings
warnings.filterwarnings("ignore")

st.title("📖 XYZ Books - Personal Assitant")
st.divider()

data_file = "data.txt"
data_persist = False
prompt = None

#containers for the chat
request_container = st.container()
response_container = st.container()

# Persist and save data to disk using Chroma
if data_persist and os.path.exists("persist"):
vectorstore = Chroma(persist_directory="persist", embedding_function=OpenAIEmbeddings())
index = VectorStoreIndexWrapper(vectorstore=vectorstore)
else:
loader = TextLoader(data_file)
loader.load()
if data_persist:
index = VectorstoreIndexCreator(vectorstore_kwargs={"persist_directory":"persist"}).from_loaders([loader])
else:
index = VectorstoreIndexCreator().from_loaders([loader])

chain = ConversationalRetrievalChain.from_llm(llm=ChatOpenAI(model="gpt-3.5-turbo"), retriever=index.vectorstore.as_retriever(search_kwargs={"k": 1}))

if 'history' not in st.session_state:
st.session_state['history'] = []

if 'generated' not in st.session_state:
st.session_state['generated'] = ["Hello ! I am your Personal assistant built by XYZ Books"]

if 'past' not in st.session_state:
st.session_state['past'] = ["Hey ! 👋"]

def conversational_chat(prompt):
result = chain({"question": prompt, "chat_history": st.session_state['history']})
st.session_state['history'].append((prompt, result["answer"]))
return result["answer"]


with request_container:
with st.form(key='xyz_form', clear_on_submit=True):

user_input = st.text_input("Prompt:", placeholder="Message XYZBot...", key='input')
submit_button = st.form_submit_button(label='Send')

if submit_button and user_input:
output = conversational_chat(user_input)

st.session_state['past'].append(user_input)
st.session_state['generated'].append(output)

if st.session_state['generated']:
with response_container:
for i in range(len(st.session_state['generated'])):
message(st.session_state["past"][i], is_user=True, key=str(i) + '_user', avatar_style="adventurer", seed=13)
message(st.session_state["generated"][i], key=str(i), avatar_style="bottts", seed=2)

Run the app:

streamlit run app.py

What Next?

Now that you have learned how to train OpenAI with your custom data, you can use these learnings as stepping stones for your larger projects where you create your own AI models using private datasets and deploy your own AI chatbots for your brand.

I hope this article will help you to create nice apps, do not hesitate to reach me out on LinkedIn if you have any queries.

Thanks for reading this article.

You can find the full project on my 👉 Github

--

--