Hello LLM: Chatting with Your Own Data with RAG

Dagang Wei
4 min readJan 4, 2024

--

This article is part of the series Hello LLM.

Image generated by the author with DALL-E

Introduction

For years, data has been trapped in text files, spreadsheets, databases. We’ve analyzed it through traditional means — text editors, scripts, charts, graphs — but what if we could chat with our data directly, and ask it questions?

In the previous blog post Hello LLM, I introduced how to build a local chatbot with LangChain and LLAMA2. This time, I will continue to show you how you can leverage a technique called Retrieval-Augmented Generation (RAG) to connect LLM to your own data.

How RAG Works

RAG is a technique that helps large language models (LLMs) provide more accurate and reliable answers by grounding them in external knowledge bases. This is done by retrieving relevant information from the knowledge base and then using that information as part of the input to LLM, so it can generate more relevant response.

The steps of using RAG are as follows:

  • Transformation: The first step is to transform your data into embeddings and store them in a vector database.
  • Retrieval: The second step is to retrieve relevant information from the vector database. This is done by using algorithms to search for information that is semantically related to the user’s query.
  • Augmentation: Once the relevant information has been retrieved, it is used to augment the original query and then passed to the LLM.
  • Generation: The LLM then uses this information to generate a response to the user’s query.

With LangChain, you don’t have to implement the steps explicitly, you only need to care about the first step of preparing the vector database, other steps are handled by the framework, hidden from the user code.

One thing to note is that, the embedding model (e.g., HuggingFaceEmbeddings, OpenAIEmbeddings) for your data doesn’t have to be the same as the embedding model of the LLM (e.g., Llama2), because they don’t interact with each other, the vector database is only used to retrieve related context for the query, but the input to LLM is still natural languages.

Code

The following is an example LLM chatbot which allows you to talk to your own PDF files.

from langchain_community.llms import LlamaCpp
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain.llms import CTransformers
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from langchain.document_loaders import DirectoryLoader, TextLoader, PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter


def create_vector_store(data_dir):
'''Create a vector store from PDF files'''
# define what documents to load
loader = DirectoryLoader(path=data_dir, glob="*.pdf", loader_cls=PyPDFLoader)

# interpret information in the documents
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500,
chunk_overlap=50)
texts = splitter.split_documents(documents)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'})
# create the vector store database
db = FAISS.from_documents(texts, embeddings)
return db


def load_llm():
# Adjust GPU usage based on your hardware
llm = LlamaCpp(
model_path="models/llama-2-7b-chat.Q4_0.gguf", # Path to the model file
n_gpu_layers=40, # Number of GPU layers (adjust based on available GPUs)
n_batch=512, # Batch size for model processing
verbose=False, # Enable detailed logging for debugging
)
return llm


def create_prompt_template():
# prepare the template we will use when prompting the AI
template = """Use the provided context to answer the user's question.
If you don't know the answer, respond with "I do not know".

Context: {context}
Question: {question}
Answer:
"""

prompt = PromptTemplate(
template=template,
input_variables=['context', 'question'])
return prompt

def create_chain():
db = create_vector_store(data_dir='data')
llm = load_llm()
prompt = create_prompt_template()
retriever = db.as_retriever(search_kwargs={'k': 2})
chain = RetrievalQA.from_chain_type(llm=llm,
chain_type='stuff',
retriever=retriever,
return_source_documents=False,
chain_type_kwargs={'prompt': prompt})
return chain

def query_doc(chain, question):
return chain({'query':question})['result']


def main():
chain = create_chain()

print("Chatbot for PDF files initialized, ready to query...")
while True:
question = input("> ")
answer = query_doc(chain, question)
print(': ', answer, '\n')


main()

Output:

$ python local_doc_qa.py 
Chatbot for PDF files initialized, ready to query...

> What are the open issues in computer vision in sports mentioned in the paper?
: According to the provided context, the open issues in computer vision in sports mentioned in the paper are potential research directions for future research in various sports, including:
• Detection of players' positions at any given point of time.
• Pose estimation (e.g., identifying a player's pose or position).
• Trajectory prediction (e.g., predicting the path of a ball or a player).

> Elaborate on pose estimation
: Pose estimation is a technique used to predict the position and orientation of body joints in an image or video stream. It is a fundamental component of various computer vision tasks, such as object recognition and human-computer interaction. In sports settings, pose estimation can be particularly useful for analyzing athletes' movements, tracking their performance, and identifying specific actions or maneuvers. However, there are several challenges associated with applying pose estimation in sports contexts, including limited processing time, reliance on appearance models, and sensitivity to calibration errors and noisy detections.
To address these challenges, researchers have recently employed OpenPose [70], a state-of-the-art pose estimation algorithm, for action recognition in videos. OpenPose is capable of estimating 3D body joint locations from 2D image data, and has

References

--

--