How to create a chatbot with your own documents using LangChain and ChatGPT

DP6 Team
DP6 US
Published in
8 min readFeb 9, 2024

Starting today, a practical example of implementing a Retrieval-Augmented Generation (RAG)

Introduction

In the dynamic world of artificial intelligence, chatbots have played a key role in improving the interaction between companies and customers. Today, we will explore an innovative approach to creating chatbots, combining the power of ChatGPT with LangChain technology.

Imagine having a chatbot that not only answers users’ questions, but is also able to search for specific information in your own documents, providing more contextual and accurate answers. This is exactly what the Retrieval-Augmented Generation (RAG) approach offers. In this article, we’ll guide you through the process of building a personalized chatbot that uses your own documents as a source of knowledge.

About RAG

Retrieval-Augmented Generation (RAG) is an advanced approach in the field of natural language processing (NLP) that combines information retrieval mechanisms with text generation models. Instead of relying exclusively on generative models to create answers, RAG integrates an information retrieval component. This component helps extract relevant contextual information from a database and provides this data to the text generation model, making it possible to produce more precise and contextualized answers.

Here’s how the approach works:

  1. The user asks the model a question;
  2. The question is passed to the Retrieval Model;
  3. The Retrieval Model retrieves the relevant documents from the database to answer the user’s question;
  4. The Retrieval Model sends the LLM a prompt containing the user’s question and the relevant information present in the retrieved documents;
  5. The pre-trained LLM generates an answer based on the information provided and returns it to the user.
(Retrieval Augmented Generation — Source: Deci)

Implementation

Without further ado, let’s start by preparing our virtual environment with the necessary prerequisites for our application to work.

Note: In this guide we will assume that the reader already has previous knowledge of programming in Python.

Requirements

Let’s create a folder called chatbot and start a virtual environment:

python3 -m venv venv

Activate the virtual environment and install the application dependencies:

source venv/bin/activate

pip install --upgrade pip

pip install python-dotenv langchain langchain-openai openai milvus pymilvus unstructured tiktoken lark

Environment variables

In the root folder of the project, create an .env file and add the following variables:

OPENAI_API_KEY = ""
MILVUS_HOST = "localhost"
MILVUS_PORT = "19530"

Replace the value of OPENAI_API_KEY with your OpenAI API key.

If you want to use a Milvus server in another location, feel free to replace the MILVUS_HOST and MILVUS_PORT variables.

Template

Now let’s implement our RAG. Start by creating a file called model.py in the root folder.

Open the file using your favorite editor and import the necessary dependencies:

import os
from dotenv import load_dotenv
load_dotenv()

from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
from langchain.memory import ConversationTokenBufferMemory
from langchain_core.prompts import MessagesPlaceholder
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_openai.chat_models import ChatOpenAI
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.vectorstores import Milvus
from milvus import default_server as milvus_server

Now let’s create our RAG class, where we’ll implement all the logic of our data retrieval model:

class RAG():
def __init__(self,
docs_dir: str,
n_retrievals: int = 4,
chat_max_tokens: int = 3097,
model_name = "gpt-3.5-turbo",
creativeness: float = 0.7):
self.__model = self.__set_llm_model(model_name, creativeness)
self.__docs_list = self.__get_docs_list(docs_dir)
self.__retriever = self.__set_retriever(k=n_retrievals)
self.__chat_history = self.__set_chat_history(max_token_limit=chat_max_tokens)

As you can see, our class receives 5 parameters, only 1 of which is mandatory (a string containing the name of the document folder). Furthermore, when it is instantiated, our RAG object assigns values to 4 private variables:

  1. __model: the object of our OpenAI GPT model;
  2. __docs_list: the list of uploaded documents;
  3. __retriever: the retriever we use to retrieve the data;
  4. __chat_history: the buffer we use to store the conversation history of our chat in memory.

Private methods

Now let’s implement our 4 private methods that are used precisely to assign values to our private variables mentioned above. Let’s go:

  1. LLM Model:

We’re going to instantiate the gpt-3.5-turbo model using the ChatOpenAI class provided by LangChain. Here it’s important to note that, although we’re using the OpenAI model, LangChain also has integration with several other GenAI models. For more information on other models, visit the official documentation.

def __set_llm_model(self, model_name = "gpt-3.5-turbo", temperature: float = 0.7):
return ChatOpenAI(model_name=model_name, temperature=temperature)

2. Docs List:

To read our documents, we’ll use LangChain’s DirectoryLoader. Note that we have enabled recursive mode (to read subfolders) and multithreading mode (to run in parallel on more than one processor core). In our example we are using a maximum of 4 cores, but you can change this by changing the max_concurrency variable.

We use the load_and_split() function to load our files. This function automatically splits large files into smaller ones, ensuring the correct reading and storage for our model.

def __get_docs_list(self, docs_dir: str) -> list:
print("Carregando documentos...")
loader = DirectoryLoader(docs_dir,
recursive=True,
show_progress=True,
use_multithreading=True,
max_concurrency=4)
docs_list = loader.load_and_split()

return docs_list

3. Retriever:

Before moving on to the retriever, we need to create our vector store. For this we are using the Milvus vector database. Note that in collection_name we give our data collection a descriptive name. If you want, feel free to change it.

Now that we have the vector store, we can start creating our retriever. First we need to create the metadata_field_info variable, which stores information about our documents’ metadata. By default, when reading documents, the loader adds the “source” metadata, where it stores the path and name of the file to which it refers. Here, we just give more details about this metadata so that the model can interpret what it is. If you add more metadata to your documents, don’t forget to describe it in this list. We’ve also created the document_content_description variable to tell the model what our documents are about.

Now that we have our vector store and our metadata variables, we can create our retriever itself. For this example, we’ll use the Self-Querying Retriever, passing as parameters our AI model, the vector store, our metadata information and a final “k” parameter. The “k” parameter provides the maximum number of documents that should be retrieved by the retriever at each run.

def __set_retriever(self, k: int = 4):
# Milvus Vector Store
embeddings = OpenAIEmbeddings()
milvus_server.start()
vector_store = Milvus.from_documents(
self.__docs_list,
embedding=embeddings,
connection_args={"host": os.getenv("MILVUS_HOST"), "port": os.getenv("MILVUS_PORT")},
collection_name="personal_documents",
)

# Self-Querying Retriever
metadata_field_info = [
AttributeInfo(
name="source",
description="O caminho de diretórios onde se encontra o documento",
type="string",
),
]

document_content_description = "Documentos pessoais"

_retriever = SelfQueryRetriever.from_llm(
self.__model,
vector_store,
document_content_description,
metadata_field_info,
search_kwargs={"k": k}
)

return _retriever

4. Chat History:

Finally, we use a Conversation Token Buffer to store the conversation history in memory. As we’re using OpenAI’s gpt-3.5-turbo model, which has a limit of only 4097 tokens per request, we’ll use this buffer as it’s capable of storing data up to a predefined limit of tokens (in our case, 3097). When the buffer reaches its token limit, it automatically starts discarding older messages, giving memory priority to newer messages.

def __set_chat_history(self, max_token_limit: int = 3097):
return ConversationTokenBufferMemory(llm=self.__model, max_token_limit=max_token_limit, return_messages=True)

Public methods

Now that we have our private methods and all the initialization logic for our private variables in our constructor, let’s create our public ask() method. This will be the method responsible for receiving a question from the user and returning an appropriate answer. Let’s go:

def ask(self, question: str) -> str:
prompt = ChatPromptTemplate.from_messages([
("system", "Você é um assistente responsável por responder perguntas sobre documentos. Responda a pergunta do usuário com um nível de detalhes razoável e baseando-se no(s) seguinte(s) documento(s) de contexto:\n\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user", "{input}"),
])

output_parser = StrOutputParser()
chain = prompt | self.__model | output_parser
answer = chain.invoke({
"input": question,
"chat_history": self.__chat_history.load_memory_variables({})['history'],
"context": self.__retriever.get_relevant_documents(question)
})

# Atualização do histórico de conversa
self.__chat_history.save_context({"input": question}, {"output": answer})

return answer

First, we create a ChatPromptTemplate from a list of messages, which contains the history of messages saved in the buffer via the MessagesPlaceholder object. Once we have our template, we create our chain and invoke it by passing the variable with the user’s question, the messages in the history and the relevant documents for answering the question (obtained via the get_relevant_documents function). Finally, we update the question history and return the answer provided by the GenAI model.

Execution

Done! We have our RAG properly implemented! Now we need to create the file that will be responsible for running our application. In the root folder of the project, create a file called main.py and add the following:

from model import RAG

rag = RAG(
docs_dir='docs', # Nome do diretório onde estão os documentos
n_retrievals=1, # Número de documentos retornados pela busca (int) : default=4
chat_max_tokens=3097, # Número máximo de tokens que podem ser usados na memória do chat (int) : default=3097
creativeness=1.2, # Quão criativa será a resposta (float 0-2) : default=0.7
)

print("\nDigite 'sair' para sair do programa.")
while True:
question = str(input("Pergunta: "))
if question == "sair":
break
answer = rag.ask(question)
print('Resposta:', answer)

Here we import our RAG object created in the model.py file and instantiate it with the desired parameters (feel free to change them as you wish). After that, we use a while loop to keep our chat running, waiting for a question from the user.

As you can see, we pass “docs” as the value for the docs_dir parameter, meaning that we create a folder called docs at the root of the project and add all our documents to it. Our RAG will then read the documents contained in this folder, find which ones are relevant to answering the user’s question and use GPT-3.5 to create an appropriate answer with the data provided.

And voilà! Now that we have our chatbot, just run the following command in the terminal:

python main.py

and have fun talking to him about your documents!

Next steps

To further improve the application, we suggest building a user interface to make the chat more intuitive and dynamic. To facilitate development, we recommend the following libraries:

  • LangServe: LangChain itself provides this library. It focuses more on creating REST APIs, but also comes with a simple integrated user interface for configuring and running the application with streaming output and visibility in the intermediate stages. A simpler and quicker option for those who don’t have much interest in customizing the interface.
  • Streamlit: An efficient Python library for creating interactive and attractive user interfaces, simplifying the development of web applications without the need for advanced design or front-end programming skills.
  • Dash: A Python library for building interactive analytical applications using reusable web components.

Repository

All the code developed for this article can be found in the following GitHub repository:

Profile of the Author: Rafael Felipe dos Santos Machado | Passionate about innovations in data and Artificial Intelligence. With a bachelor’s degree in Computer Science from the Federal University of Alfenas, he always seeks to evolve his knowledge of the state of the art in the field. He works as a Data Scientist at DP6.

--

--