But what is RAG??

5 min readApr 10, 2024

Imagine you are in the exam room, writing your paper peacefully, and you are happy that you have answered most of the questions. As you reach the end of the question paper, you realize the questions aren’t familiar with what you have learned. They are out of the syllabus, and a few questions are from a topic you were not taught by the professor. You are panicked, so you try hard to somehow write anything related to that topic just to get marks for an attempt. Isn’t that stressful?? Indeed, it is. The very next day, you have an open book exam, and you are given access to books and research papers to look up to and find answers to. Now, whatever answer you provide to the questions, you have solid proof of its source, and your answers will be more accurate as you have references and context handy. But the same is not the case with the closed book exam, which we talked about earlier. There, you don’t have any reference books handy and have to come up with answers on your own, which may not be that accurate because you may end up adding a pinch of your understanding regarding a particular topic, and that may not be 100% true. Agree???

Enough of the storytelling; now let’s jump onto the concepts.

You see, the LLMs very much mimic the human mind. As I'm writing this blog now, I'm able to write it because I have context ready in my mind, and I'm writing each word one by one based on my knowledge, which makes sense concerning the prior word(s) that I wrote earlier.

The LLM works similarly. They are trained on a massive text corpus, which helps them understand language, grammar, punctuation, word combinations, etc. I mean to say that after training, LLMs are aware of the language and can successfully solve text generation, summarization, translation, and many other tasks. But even though we do have this great model that spits out words so accurately, it still faces some issues.

Problems

Hallucination: They tend to hallucinate very confidently, which may lead to misinformation
Limited by Training Data: They know nothing outside of their training data.
Black Box Outputs: One cannot confidently find out what has led to the generation of particular content.

So what is the solution????

Solution

Let’s backtrack on the problems to come up with a solution

As we discussed in the beginning, if we have an open-book test, we are more likely to write accurate answers. Similarly, what if we provide the LLM model with some book from which it can take references, understand context, and then generate answers accordingly? Sure, we can. This would solve the problem of hallucination, as the model is less likely to generate random content as it has already been provided with context from the book. If we keep on updating the book, we can solve the second problem of outdated knowledge. Using the book, we can solve the last problem because now we know why the LLM has generated particular content as we have a piece of information from the book to cite as the source of the text generation.

I used the word book in context with the LLM, which is very naive. This could be any external knowledge base, like a PDF, text file, CSV, JSON, etc. Clubbing the LLM with an external knowledge base gives it the superpower to overcome all the aforementioned problems.

Retrieval Augmented Generation (RAG) in action

Let’s break it down.

Retrieval means to get the information from the knowledge base;
Augmented means to aid and
Generation means to generate text. It’s as simple as it sounds.

Great, we have reached this far. But I would like to tell you one more fascinating fact. The LLM model has never seen a single piece of the word we have fed it. No model across various modalities like computer vision and speech recognition has ever seen their corresponding inputs as image, video, and audio, respectively. All the inputs are first processed and converted into a mathematical representation; let’s say text will be converted into embeddings, images will be converted into the matrix, and the audio signal is converted to a spectrogram because the model only understands numbers and nothing else.

Back to the topic now. How do we connect LLM to the external knowledge base?? It is done in the following steps:

Chunked Knowledge Representation: Break down the information into small units called chunks. These chunks can be sentences, paragraphs, or even specific concepts within the text.
Embeddings ~ capture meaning in numbers: These chunks are then converted into corresponding embeddings. These embeddings hold the most useful information about the chunk.
Storing the embeddings: These embeddings are then stored in a database
Retrieval: User query is also embedded into a vector and then, most semantically similar chunks are extracted from the database
Generation: The user queries and retrieved information from the database, which is used as input to the LLM to generate the response.

And why adding more context to the LLM helps generate a more useful response is very self-explanatory. It is so because now it will have to do less work on its own to generate the sequence of words because it has some background related to that concept, which will enable a more comprehensive understanding of the input and hence aid the generation.

If you want to know how to implement RAG, please check out this blog:

Implementing RAG using Langchain and Ollama

Let’s use one of the most famous techniques to ground the LLM and guide the LLM to respond with more accurate…

medium.com

That’s it from my side. I hope you have an idea now as to what a bird's-eye view is for the widely famous concept of RAG.