Q&A chatbot by Lexica AI

Building personal assistant with LlamaIndex and GPT-3.5

Iva @ Tesla Institute
Artificialis
Published in
4 min readMar 16, 2023

--

Have you ever wished for a personal assistant who could help you manage your knowledge or answer your questions based on your documents? With the recent release of the GPT 3.5 series API by OpenAI, it is now possible to build your own Q&A chatbot based on your own data.

In this article, we will explore how to build a document Q&A chatbot efficiently with llama-index and GPT API.

One exciting application is Question Answering (QA), which enables the bot to retrieve information from documents and provide quick responses to your natural language queries. We can use this advanced NLP system for a variety of purposes, such as streamlining customer support, synthesizing user research, and managing personal knowledge. Take your productivity to the next level with the GPT 3.5 API!

THE IDEA

The idea of using ChatGPT as an assistant to synthesize customer feedback or find related old product documents about a feature being worked on is a potential use case. Initially, fine-tuning the GPT model with specific data was considered to achieve this goal, but this approach requires a large dataset and it is expensive. Fine-tuning the model can only teach it a new skill rather than provide complete information about the documents. For (multi-)document QA, prompt engineering by providing context in the prompts is another approach. On the other hand, the GPT model has a limited attention span, and passing a long context to the API can be costly, given that there are thousands of customer feedback emails and hundreds of product documents.

While researching ways to overcome the limitations of the prompt’s input token limit, I came up with an idea to use an algorithm to search through documents and extract only the relevant excerpts. By passing these relevant contexts along with my questions to the GPT model, I hoped to achieve better results. During my search, I discovered the gpt-index library, which was later renamed to LlamaIndex. This library provided a straightforward solution to my problem and enabled me easily implementation of the idea.

Building document Q&A chatbot

We’ll use LlamaIndex and GPT (text-davinci-003) to create a Q&A chatbot that operates on existing documents.

Prerequisites:

  • An OpenAI API Key, which can be obtained from https://platform.openai.com/account/api-keys.
  • A database of your documents. LlamaIndex supports various data sources such as Notion or Google Docs. For this tutorial, a simple text file will be used for demonstration.
  • A local Python environment or an online Google Colab notebook.

Steps:

  • Create an index of your document data utilizing LlamaIndex
  • Formulate a natural language query to search the index
  • LlamaIndex will retrieve the pertinent parts of the document and provide them to the GPT prompt
  • Ask GPT the question with the relevant context and generate a response.

LlamaIndex creates a vectorized index from your document data, making it highly efficient to query. It then uses this index to identify the most relevant sections of the document based on the similarity between the query and data. The retrieved information is then incorporated into the prompt sent to GPT, providing it with the necessary context to answer your question.

Let’s do it!

These commands will install both LlamaIndex and OpenAI.

!pip install llama-index
!pip install openai

We’ll import the libraries and set up the OpenAI API key:

import os
from llama_index import GPTSimpleVectorIndex, Document, SimpleDirectoryReader
os.environ['OPENAI_API_KEY'] = 'API-KEY'

Now we will need to construct an index of our document. I’ve chosen a file where I store ideas for my future website.

documents = SimpleDirectoryReader('your_directory').load_data()

After loading the documents, we can then construct the index:

index = GPTSimpleVectorIndex(documents)

Querying the index:

response = index.query("What are the best ideas for the site?")
print(response)

I got this response:

The best ideas for the site would be to create content that is informative, engaging, and relevant to the Serbian AI community. This could include articles and blog posts covering the latest news and developments in AI, in-depth interviews with experts and thought leaders in the Serbian AI community, case studies showcasing real-world applications of AI in different sectors, educational resources for anyone who wants to learn more about AI and its applications, infographics and visualizations that explain complex AI concepts and trends, reports and whitepapers analyzing the state of AI in Serbia and its potential for growth and development, podcasts or video series featuring conversations with AI professionals, researchers, and enthusiasts, op-eds and thought pieces discussing the ethical and social implications of AI, and how they should be addressed in Serbia, event coverage and summaries of conferences, workshops, and meetups related to AI in Serbia, and profiles of AI startups, companies, and organizations in Serbia, highlighting their achievements and contributions to the field.

Pretty accurate!

Consider trying out some more advanced examples, I’ve attached LlamaIndex documentation in the resources section.

CONCLUSION:

This article has demonstrated the effectiveness of combining GPT with LlamaIndex for the creation of a document question-answering chatbot. While GPT alone is an impressive tool, its capabilities can be greatly enhanced by integration with other tools, data, and processes. As NLP technology continues to advance, we can expect to see even more innovative applications of this powerful tool in the near future.

RESOURCES:

--

--

Iva @ Tesla Institute
Artificialis

hands-on hacks, theoretical dig-ins, and real-world know-how guides. sharing my notes along the way; 📝