RAG Chatbot using ChromaDB, LlamaIndex and Streamlit

Sindhu Madicherla
4 min readMay 7, 2024

--

This article aims to create a simple chatbot application called ‘ResearchBot’, using research articles from arXiv. ArXiv is an open-access archive for millions of scholarly articles. We will implement RAG architecture using Llama index Open AI for embeddings, Chroma DB for vector store, and Streamlit for building a simple UI. The gitub link for the same can be found here.

Before moving forward, make sure that you are familiar with RAG, and also have an Open AI API key. On a high level, RAG architecture allows us to feed data from external sources to pre trained LLMs like Open AI, LLAMA etc. and get responses that are more relevant to our use case.

We will start by pulling the required modules from requirements.txt . All the required variables, along with Open AI Key have to be mentioned in configs.py. Our code flow can be divided into 2 major operations.

  1. Data Ingestion
  2. Data Retrieval

Data Ingestion :

Download and parse data from Arxiv

ArXiv provides a python module called arXiv, which we will use to download the articles in PDF format. We can either search by the paper ID, or get the papers related to a particular topic.

Once the papers are downloaded, we convert them to markdown text and to Documents finally, which can be parsed by Llama Index’s MarkDownNodeParser

I am splitting the documents into chunks of size 5000, as the documents are lengthy sometimes, and Open AI’sext-embedding-3-small has a token limit of 8191 tokens.

We can load data of any other type by choosing the right parser from Llama Index Node Parsers.

Embed the articles and store them to Vector Store:

We need to first create a Vector store or get an existing one using Chromadb. I am using Open AI embedding function.

As of now, we have just collected all the required ingredients to be able to save the data. We now need to run the final Ingestion pipeline available from LlamaIndex to persist the data into the Chroma Vector Store that we have created

We can cache the data locally, to be able to access it faster later.

Data Retrieval

Index retrieval

Index is the key component in our retrieval, and the main entry point to the Vector Store index. We will first retrieve the index from the Vector Store created above. When the user asks a question in the chatbot, we can get the top K nodes which are the most similar to the query using the index.as_retriever method. We can then add the data as context and build the prompt to Open AI

Once the prompt is built, we will send this to Open AI Chat Completion method to get the response. The files chat.py and thread.py have been written additionally under utils to be able to maintain the chat thread.

We can also add ‘Function calling’ option while sending the request to Open AI. This option allows us to call other functions by intelligently generating a JSON object that contains the arguments to the function. We need to have proper doc string to mention what should be the values to the inputs of the function. In our case, we have added the function ‘context_retriever’, which will pull more relevant data , if needed by the LLM. We can let the LLM decide to call the function or not by providing ‘tool_choice’ as ‘auto’.

More information on Function Calls can be found in Open AI documentation

Streamlit application :

This is the final part where we build our app.py by combining the data ingestion and retrieval logics mentioned above. We can create a simple chatbot interface using Streamlit’s st.chat_input and st.chat_message methods. It should be able to do the following activities.

  • Load the Vector index on starting the application. We can add a cache_resource decorator while calling the method to load index, so that the the function runs only once during the start of the application.
  • Retrieve index, get the relevant nodes to the user query, and print the response in the conversation. The conversations will have to be saved in streamlit’s session_state to display it as a chat
  • Run the pipeline to load new data on uploading a file. We will add upload button to our page on the top. The content in the file can either be a comma separated list of page IDs , or a specific topic like “topic-Generative AI”. Example pages have been given in papers.txt and papers1.txt

Finally, run the below to start the application and chat with it.

streamlit run app.py

The application looks like below.

We can play around the application by modifying values like Prompt, temperature , embedding model etc. We can further add CI/CD pipeline and make into a docker application hosted on AWS by following the corresponding sections in this medium post.

--

--

Sindhu Madicherla

The more I learn, the more I realize that I have to learn a lot more