Becoming the Ultimate RAG Picker for Your LLM

Pinaki Brahma
3 min readMay 20, 2024

--

Image: Powered by DALL-E 3 based on a prompt from Author

There are a lot of articles that talk about RAGs and some ‘hello world’ use cases where they implement RAGs along with LLMs to generate results. However, in the real world, we will see that there are a lot of moving parts that needs to be tuned. This post will be on why we need to spend a large `chunk` of our time on developing the best RAG-system for our use case.

There is no domain that is left untouched by these data munching monsters called LLMs. They are here to stay. And they are here to change the world. Businesses can ‘personalize’ these LLMs to suit their needs to make these LLMs even more effective. ‘Fine-Tuning (FT)’ or developing a ‘Retrieval-Augmented Generation (RAG)’ based application are the two most popular options. Now, finetuning requires significant amounts of data and investments — sometimes going to millions of dollars. This leaves us with RAGs.

Imagine your LLM is like Jon Snow, the fearless and skilled commander of the Night’s Watch. Jon is a great leader and warrior. However, even Jon Snow doesn’t know everything about the Seven Kingdoms and the threats they face.

Now imagine Samwell Tarly to be the RAG in our epic saga. Samwell is a walking encyclopedia with a knack for digging through scrolls and books to find crucial information. Whenever Jon needs to know something very specific, Samwell scurries off to the library, retrieves the relevant knowledge, and brings it back to Jon.

With Samwell’s support, Jon Snow doesn’t have to rely solely on his own knowledge and instincts. Thanks to this dynamic duo, the Night’s Watch becomes much more effective at protecting the realm.

So, in essence, your LLM (Jon) is powerful and capable, but with the added help of RAG (Samwell), it becomes an even more formidable force, combining brute strength with precise knowledge to conquer any challenge.

The crux of the matter is — LLMs are as smart as the retrieved outputs from the RAG system would let them be. If the RAG is able to retrieve relevant data from its knowledge base, LLMs could use this to generate great results. On the flip side, if the RAG spits out irrelevant data, the LLMs will find it very difficult to be useful.

There are so many moving parts in the RAG architecture. Awareness of these options is the first step in the right direction.

Image: Moving parts in the RAG Architecture

The above diagram has multiple blocks — each denoting the different layers in the system. For example, the Pre-Processing layer is a block that handles text parsing and text cleaning. Similarly, we have the Chunking layer and the Embedding layer. Under each layer, I have mentioned some of the different ways to play with the levers, such that it finally improves the quality of the retrieval output — the retrieved chunks.

There are some colored blocks in the diagram as well. These blocks denote the ones that are active during the inference time — when the user asks his/her ‘query’.

As of now, I am actively experimenting on these. I will be posting a follow up to this article that will have more in-depth explanations of some of the above levers.

You can let me know in the comments if you have used these strategies in your use cases. Also, feel free to comment about any other strategies that has worked for you.

--

--