Using LlamaIndex and Large Language Models: Building a Personal ChatBot for Private Knowledge Base Queries

4 min readAug 7, 2023

Large Language Models (LLMs) have revolutionized the field of natural language processing, enabling powerful AI chatbots that can provide accurate and context-aware responses. In this article, we’ll explore the step-by-step process of building an AI chatbot that leverages a private knowledge base to deliver precise answers to user queries.

My chatbot was built using LlamaIndex’s low-level API. The architecture of my chatbot is shown below and can be explained in five steps.

Installing llama index locally: pip install llama-index

Let’s delve into the process:

Step 1: Structure your internal documents
Begin by breaking down your entire knowledge base into smaller, manageable chunks. Each chunk should represent a distinct piece of information that can be queried. This data can come from various sources, such as Confluence documentation or supplemented PDF reports.

Storing the .xml format for the data

# Provide the raw content file for the chatbot knowledge base
from llama_index import VectorStoreIndex,download_loader

RemoteReader = download_loader("RemoteReader")
loader = RemoteReader()

doc = loader.load_data(url="https://raw.githubusercontent.com/Akhil-Sharma30/Akhil-Sharma30.github.io/")

Step 2: Embed the text corpus
Utilize an embedding model to transform each chunk of text into a vector representation. This embedding process captures the essence of the information and encodes it into a numerical format suitable for queries.

`langchain.embeddings.OpenAIEmbeddings` makes a request to the OpenAI embeddings API to embed the user query using the `text-embedding-ada-002` model.

Step 3: Storing vector embeddings
Save all the vector embeddings obtained from the embedding model in a Vector Database. This database will serve as the repository for your encoded knowledge base.

Step 4: Save text representations
Ensure you save the original text that corresponds to each vector embedding. This text will be necessary to retrieve the relevant information during the querying process.

Now, let’s construct the answer to a question or query:

Step 5: Embed the question using Vector-Embeddings
Use the same vector-embedding model employed earlier to transform the question you want to ask into a vector representation.

Step 6: Run a query
Query the Vector Database using the vector embedding generated from the question. Determine the number of context vectors you want to retrieve, which will represent the relevant chunks of information to aid in answering the query.

Step 7: Retrieve Similar vectors
`llama_index.retrievers.RetrieverQueryEngine` does a similarity search against the entries of your index knowledge base for the two most similar pieces of context by cosine similarity.

Retrieve the previously selected amount of context vectors, which will contain the most relevant information.

Step 8: Map vectors to text chunks
Associate the retrieved vectors with their corresponding text chunks. This mapping will link the numerical representations to the actual content they represent.

Step 9: Generate the answer
`llama_index.indices.query.ResponseSynthesizer` generates a response by formatting the query and retrieved-context into a single prompt and sending a request to OpenAI chat completions API with the `get-3.5-turbo`.

using only the provided context for generating the answer from the above data. It is important to perform prompt engineering to ensure the generated answers align with the expected boundaries. For instance, if the retrieved context does not contain relevant information, the LLM should avoid fabricating answers.

To create a functional chatbot, you can present a web user interface (UI) that offers a text input box for users to interact with. After going through steps 1 to 9, display the generated answer on the UI. This approach is commonly used in chatbots that rely on a private knowledge base.

Vector databases and Large Language Models are revolutionizing the way we handle and retrieve complex data structures. These powerful tools allow for efficient storage, retrieval, and manipulation of vector embeddings, enabling advanced search capabilities and context-based information retrieval.

Creating UI using Gradio.io

gradio.io — Gradio is the fastest way to demo your machine learning model with a friendly web interface so that anyone can use it, anywhere!

Function to display the response:

# Retrive the generated response 
def chat(chat_history, user_input):

  bot_response = query_engine.query(user_input)
  #print(bot_response)
  response = ""
  for letter in ''.join(bot_response.response): #[bot_response[i:i+1] for i in range(0, len(bot_response), 1)]:
      response += letter + ""
      yield chat_history + [(user_input, response)]

Demo:

# Demo for the chatbot interface
with gr.Blocks() as demo:
 
    with gr.Tab("Knowledge Bot"):
#inputbox = gr.Textbox("Input your text to build a Q&A Bot here.....")
          chatbot = gr.Chatbot()
          message = gr.Textbox ("know Akhil?")
          message.submit(chat, [chatbot, message], chatbot)

demo.queue().launch()

About me

Thank you so much for reading my article! Hi, I’m Akhil Sharma, an IT&MI student from the Cluster Innovation Center, University Of Delhi. If you have any questions, please don’t hesitate to contact me!

Email me at akhilsharma.off@gmail.com and feel free to connect with me on LinkedIn!

Project Code: https://github.com/Akhil-Sharma30/LLM_Chatbot

Personal Website: https://akhil-sharma30.github.io/

follow me on Github: https://github.com/Akhil-Sharma30

follow me on Twitter : DevelopAkhil.twitter

ChatBot Deployed on Hugging-Face Spaces: https://developerakhil-personal-chatbot.hf.space