RAG Framework explained and FAQs

Understanding how to connect external context to LLMs using RAG and FAQs

Mehul Gupta
Data Science in your pocket

--

I have received several queries on RAG, be it on Reddit or YouTube. So it’s time we address the elephant in the room and answer a few FAQS around it. So let’s get started

My debut book

LangChain in your Pocket is out now !!

What is RAG?

It is a Framework that enables an LLM to understand the external context (passed as PDF, text file, videos, etc.) and use this private knowledge for doing tasks specific to you.

How does it work?

Taking reference from the above diagram

  • The external document is fed to DocumentLoaders which extracts text out of any document type, be it PDF or a video.
  • Then some pre-processing steps are applied to this extracted text (similar to NLP).
  • Generate embeddings for this pre-processed text using any model (say BERT)
  • Store these texts alongside their respective embeddings in Vector DBs. Vector DBs are specialized databases that store embeddings and enable functionalities like text similarity search.
  • We will then initiate a RetrievalChain (which uses an LLM internally) that,

A. Intakes user input.

B. Rephrases this input and hits the Vector DB to fetch results.

C. Rephrases the output from VectorDB to present it to the user.

If put in simple words, RAG is nothing but

LLMs interacting with Vector DBs

It's no black box. Now, we will be covering a few

Frequently Asked Questions

RAG using RDBMS

Multi-Document RAG

Persisting Vector DBs

Applications on top of RAG

Data Leakage

Fine-Tuning vs RAG

1. Can we build a RAG system using Postgres/RDBMS?

No, because RDBMS/Postgres doesn’t support text similarity. They support only Regex or an exact match. In RAG, the VectorDB plays a bigger role than the LLM because of its text similarity search feature. The LLM is more or less just restructuring the input/output for presentation purposes. If you don’t believe it, check out the below demo where given a prompt/query, the vector DB without an LLM can give un-polished results but, works.

2. Can we have a Q&A over multiple documents rather than just one simultaneously?

Absolutely yes. In such a case where you have multiple documents, you can use LangChain agents and VectorDBs for different documents as tools for this agent to interact with. Check out the below demo on how I created two RAG tools for my agent.

Not just this, if you wish to generate a final response based on multiple documents in a single prompt, the above solution should suffice for that too. In that case, the prompt and tool description play a bigger role as the Agent itself needs to realise when to use tool X once it receives results from tool Y (sort of tool chaining)

3. Do we need to create embeddings for documents every time we want to use RAG?

Not at all. Similar to RDBMS, even Vector Databases can be stored and used later on. Hence, you need to create your embeddings just once and then can use the Retriever anytime over this already existing Vector DB. How to persist a vector db? check out the tutorial below.

4. Can we build an application over RAG? say NER, Text Tagging, or Summarization?

I haven’t tried it but I assume Yes. Building custom chains has become very easy with the coming of LangChain LCEL. Do remember that the input for your next segment (say NER) would just be the output of your RetrievalChain and not the entire text. If you’re expecting the entire data for your app, You need to pass the entire text in prompt and RAG can be avoided.

Check out what is LangChain LCEL

5. Can my private data leak when using RAG?

If you’re using a local LLM, no. If it’s an API (like OpenAI-API), a definite yes as eventually, even the response from vector db is sent to the LLM for re-phrasing and if it is an API, you might be in trouble.

6. How Fine-Tuning different from RAG?

In many ways. You can check out how they differ and what to use when

Parts of this post are taken from my latest book.

For any codes, refer to the book !!

--

--