How to improve RAG?

RAG hyperparameters to know

Mehul Gupta
Data Science in your pocket

--

Photo by Wesley Tingey on Unsplash

In the last couple of years, several Generative AI uses have come up but nothing got closer to the hype of the RAG system/framework which enables one to provide external context to LLMs so that even if they aren’t trained on such a dataset, they can still help you out with your work.

My debut book : LangChain in your Pocket is out !

Note: If you are a newbie to RAG, follow this

I’ve been observing several queries that RAG systems aren’t able to perform up to the mark for some folks. If that’s the case, in this post, we will deep dive into how RAG system performance can be improved by playing around with a few parameters. So let’s understand some key parameters we should be tuning :

Chunk_size: This parameter specifies the maximum number of characters allowed in each chunk while creating vector embeddings. This affects memory usage and context preservation; smaller chunks may lose context, while larger chunks can dilute specificity.

Chunk_overlap: A value greater than 0 creates an overlap between the chunks, allowing for better context when processing the text. It enhances contextual continuity and improves relevance by allowing related information to be accessible across chunks.

k: Defines the number of top documents to retrieve, which is set to 7 in the below example. It balances precision and recall; a smaller k yields highly relevant documents, while a larger k provides diversity but may include less relevant results.

search_type: Specifies the retrieval method, utilizing a similarity score threshold to filter relevant documents. It influences the types of documents retrieved and the efficiency of filtering irrelevant results, enhancing overall retrieval effectiveness.

score_threshold: Sets the minimum similarity score that a document must meet to be included in the results.It controls the quality of retrieved documents; a higher threshold ensures relevance but may exclude useful information, while a lower threshold may introduce noise.

chain_type: Indicates the method for processing and combining the retrieved documents. It affects how information is integrated for response generation; different types can enhance coherence and relevance of the final output. Chains are of 3 types majorly

  • Map Reduce Chain: Processes each document separately, summarizing them individually before merging the summaries and sending them to the language model (LM). Ideal for managing large datasets.
  • Refine Chain: Enhances the summary iteratively by refining it with each document, offering a balance between detail and efficiency.
  • Stuff Chain: Directly passes all retrieved text to the LM for rephrasing, based on the given prompt and context.

Codes …

Now, let’s see how to use these parameters while building a RAG system using LangChain. I will be using Google Gemini API for this tutorial (free to create). Check how to create it below

from langchain_google_genai import GoogleGenerativeAI,GoogleGenerativeAIEmbeddings

GOOGLE_API_KEY=''

embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001",google_api_key=GOOGLE_API_KEY)
llm = GoogleGenerativeAI(model="gemini-pro",google_api_key=GOOGLE_API_KEY)

Next, let’s set our Vector DB

from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain.document_loaders import TextLoader


#Document Loader
loader = TextLoader("You text file.txt")
data = loader.load()

#Document Transformer
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(data)

#Vector DB
docsearch = Chroma.from_documents(texts, embeddings)

As you must have observed, we have used chunk_size & chunk_overlap parameters while the text splitting (Transformation) part of RAG

Next, let’s set the retriever with the remaining parameters

#Hyperparameters to know
retriever = docsearch.as_retriever(search_type='similarity_score_threshold',search_kwargs={"k": 7,"score_threshold":0.3})
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

qa.run("YOUR QUERY")

It’s this easy!

But do remember that you need to tune these parameters to yield the best outputs and just don't go with default values. Also, there are some other hyperparameters that one can tune apart from the ones mentioned here. Do check them out in the documentation.

--

--