Day 1: Introduction to Retrieval Augmented Generation

Himanshu Singh

5 min readJan 30, 2024

This is part of the series — 10 days of Retrieval Augmented Generation

Before we start our first day, let us have a look at what lies ahead in this 10 days series:

Day 1: Introduction to Retrieval Augmented Generation (*)
Day 2: Understanding core components of RAG pipeline
Day 3: Building our First RAG
Day 4: Packaging our RAG using Streamlit and Chainlit
Day 5: Creating RAG assitant with Memory
Day 6: Building complete RAG pipeline in Azure
Day 7: Building complete RAG pipeline in AWS
Day 8: Evaluation and benchmarking RAG systems
Day 9: End to End Project 1 on RAG (Real World) with React JS frontend
Day 10: End to End Project 2 on RAG (Real World) with React JS frontend

Now, lets continue with our Day 1 — Introduction to Retrieval Augmented Generation.

Introduction to RAG

Before jumping to RAG, let’s take a look at two scenarios,

Scenario 1

Imagine there’s an open book exam. You read the question asked and search through the book to find the right answer. You adopt different strategies, like skimming through the pages, or going to a specific chapter and then looking for the answer, etc.

What if, there was a digital agent in your possession to which when you ask the question, it reads the book and fetches the best answer for you?

Scenario 2

Now Imagine you’re in a library. There are thousands of books present. Again, suppose, you have a question. But this time you don’t know which book has the answer, where the book is present, and even inside the book you don’t know where the answer would be. If you start searching, it may take you hours, or even days, to find the right answer.

But, if our digital assistant is present and you ask the same question to it, then all the headache of searching the answer is given to it and you just wait, have a sip of coffee, to get the answer. This digital assistant gets the right answer for you, and that too in minutes.

The digital assistant that we talk about in both the scenarios, in the field of Generative AI, is powered by the concept of Retrieval Augmented Generation (RAG). Let us understand the steps taken by RAG to give us the right answer,

RAG step by step

The first step is to create the repository of documents. Now this is the place where if you have a book (scenario 1) then repository will be having a single book. But if you talk about scenario 2 then the repository will be entire library. This repository is not created just by placing all the documents at one place. There are various approaches of Chunking, Embeddings etc. are required, which we will discuss later. But as of now, knowledge repository is the first step of RAG.
The user asks the question now. This is called as Prompt. This question is received by RAG to get the respose from the repository.
Once the question is received, RAG searches for the most relevant answer to the question, by going through all the documents present in the repository. This answer is extracted using various mathematical concepts like cosine similarity or maximum marginal relevance, etc. which we will discuss later.
The output is summarized and then given back to the user. This summarization happens using LLM (Large Language Models) present. The most famous one, currently, being GPT4 but there are a lot of others out there.
Additional Step: Sometimes there are a lot of documents having similar kinds of answer to the question asked. This time RAG gets confused about which may be the right answer. In this situation, RAG retrieves the top n answer (may be 5) and then the question and the 5 answeres are sent back to LLM (GPT4). LLM reads the question and the answers (aka context) and then finally gives the correct answer.

Important LLM Models

It must be understood that that the core of RAG is LLM models. They are responsible for generating contextual responses to the questions asked. Given below is the list of top 10 models that’s used in the industry currently.

GPT 4 by Open AI
PaLM 2 by Google AI
Claude v1 by Antropic AI
LlaMA by Meta AI
Mistral by Facebook AI Research
Jurrasic 1 by AI21 Labs
Flan-T5 — Google and Open Source
Megatron-Turing NLG by NVIDIA and Microsoft
Cohere by Cohere AI
BLOOM by Big Science

Important Indexes

To create the knowledge repository for RAG, indexes are used. We will talk about them in more detail later but lets list some of them.

FAISS — Vector DB
Quadrant — Vector DB
Pinecone — Vector DB
Chroma — Vector DB
Neo4J — Knowledge Graph
Azure Search Index
Amazon Kendra

As an overview, we chunk the document and create embeddings of them. These embeddings are stored in one of the above services showed above. Then we use different similarity measures to get the answers.

This finishes our first day discussion of RAG. Tomorrow we will look at the core components of RAG. We will look at chunking and embeddings, what we mean by prompts, what are vector indexes, different LLM frameworks, etc.