RAG - PDF Q&A Using Llama 2 in 8 Steps

Sanjjushri Varshini R
2 min readMar 31, 2024

--

  1. Importing Required Modules: Here, essential modules such as langchain and its components are imported to set up the environment for PDF Q&A using RAG.
# Import required modules
from langchain import hub
from langchain.chains import RetrievalQA
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.callbacks.manager import CallbackManager
from langchain.llms import Ollama
from langchain.embeddings.ollama import OllamaEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory

2. Defining Filepath and Model Settings: This snippet establishes variables like FILEPATH for the PDF file to be processed and specifies the model to be used locally as “llama2”.

FILEPATH    = "sample.pdf" 
LOCAL_MODEL = "llama2"
EMBEDDING = "nomic-embed-text"

3. Loading PDF Data: The PDF document specified in FILEPATH is loaded using PyPDFLoader, and its content is retrieved for further processing.

loader = PyPDFLoader(FILEPATH)
data = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1500, chunk_overlap=100)
all_splits = text_splitter.split_documents(data)

4. Splitting Document Text: The loaded document’s text is split into manageable chunks using RecursiveCharacterTextSplitter to facilitate efficient processing.

persist_directory = 'data'

vectorstore = Chroma.from_documents(
documents=all_splits,
embedding=OllamaEmbeddings(model=EMBEDDING),
persist_directory=persist_directory
)

5. Creating Vector Store: Chroma vector store is generated from the split documents, utilizing Ollama embeddings for semantic representation.

llm = Ollama(base_url="http://localhost:11434",
model=LOCAL_MODEL,
verbose=True,
callback_manager=CallbackManager(
[StreamingStdOutCallbackHandler()])
)

retriever = vectorstore.as_retriever()

6. Defining Prompt Template and Memory Handling: A prompt template is defined to structure the interaction between the user and the chatbot. Additionally, a memory mechanism is established to maintain conversation history.

template = """ You are a knowledgeable chatbot, here to help with questions of the user. Your tone should be professional and informative.

Context: {context}
History: {history}

User: {question}
Chatbot:
"""
prompt = PromptTemplate(
input_variables=["history", "context", "question"],
template=template,
)

memory = ConversationBufferMemory(
memory_key="history",
return_messages=True,
input_key="question"
)

7. Configuring RetrievalQA Chain: The RetrievalQA chain is instantiated, integrating the components necessary for Q&A, including the language model, retriever, and memory.

qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type='stuff',
retriever=retriever,
verbose=True,
chain_type_kwargs={
"verbose": True,
"prompt": prompt,
"memory": memory,
}
)

8. Setting Up Query: A sample query is formulated, specifying the information sought regarding clustering methods within the PDF document.

query = "What clustering methods that have been implemented?"
query += ". Only from this pdf. Keep it short"

9. Invoking Q&A Chain: Finally, the Q&A chain is invoked with the formulated query, triggering the RAG model to retrieve and generate a concise answer from the PDF content.

qa_chain.invoke({"query": query})

The outlined code snippets exemplify the intricate process of implementing RAG for PDF question and answer interactions, showcasing the fusion of advanced natural language processing techniques with document analysis. By leveraging models like RAG within PDF documents, users can seamlessly extract targeted information, revolutionizing the way we interact with textual data.

GitHub: https://github.com/Sanjjushri/rag-pdf-qa-llama2

--

--