Day 1: Introduction to Retrieval Augmented Generation

Himanshu Singh
5 min readJan 30, 2024

--

This is part of the series — 10 days of Retrieval Augmented Generation

Before we start our first day, let us have a look at what lies ahead in this 10 days series:

  1. Day 1: Introduction to Retrieval Augmented Generation (*)
  2. Day 2: Understanding core components of RAG pipeline
  3. Day 3: Building our First RAG
  4. Day 4: Packaging our RAG using Streamlit and Chainlit
  5. Day 5: Creating RAG assitant with Memory
  6. Day 6: Building complete RAG pipeline in Azure
  7. Day 7: Building complete RAG pipeline in AWS
  8. Day 8: Evaluation and benchmarking RAG systems
  9. Day 9: End to End Project 1 on RAG (Real World) with React JS frontend
  10. Day 10: End to End Project 2 on RAG (Real World) with React JS frontend

Now, lets continue with our Day 1 — Introduction to Retrieval Augmented Generation.

Introduction to RAG

Before jumping to RAG, let’s take a look at two scenarios,

Scenario 1

Imagine there’s an open book exam. You read the question asked and search through the book to find the right answer. You adopt different strategies, like skimming through the pages, or going to a specific chapter and then looking for the answer, etc.

What if, there was a digital agent in your possession to which when you ask the question, it reads the book and fetches the best answer for you?

Scenario 2

Now Imagine you’re in a library. There are thousands of books present. Again, suppose, you have a question. But this time you don’t know which book has the answer, where the book is present, and even inside the book you don’t know where the answer would be. If you start searching, it may take you hours, or even days, to find the right answer.

But, if our digital assistant is present and you ask the same question to it, then all the headache of searching the answer is given to it and you just wait, have a sip of coffee, to get the answer. This digital assistant gets the right answer for you, and that too in minutes.

The digital assistant that we talk about in both the scenarios, in the field of Generative AI, is powered by the concept of Retrieval Augmented Generation (RAG). Let us understand the steps taken by RAG to give us the right answer,

RAG step by step

  1. The first step is to create the repository of documents. Now this is the place where if you have a book (scenario 1) then repository will be having a single book. But if you talk about scenario 2 then the repository will be entire library. This repository is not created just by placing all the documents at one place. There are various approaches of Chunking, Embeddings etc. are required, which we will discuss later. But as of now, knowledge repository is the first step of RAG.
  2. The user asks the question now. This is called as Prompt. This question is received by RAG to get the respose from the repository.
  3. Once the question is received, RAG searches for the most relevant answer to the question, by going through all the documents present in the repository. This answer is extracted using various mathematical concepts like cosine similarity or maximum marginal relevance, etc. which we will discuss later.
  4. The output is summarized and then given back to the user. This summarization happens using LLM (Large Language Models) present. The most famous one, currently, being GPT4 but there are a lot of others out there.
  5. Additional Step: Sometimes there are a lot of documents having similar kinds of answer to the question asked. This time RAG gets confused about which may be the right answer. In this situation, RAG retrieves the top n answer (may be 5) and then the question and the 5 answeres are sent back to LLM (GPT4). LLM reads the question and the answers (aka context) and then finally gives the correct answer.

Important LLM Models

It must be understood that that the core of RAG is LLM models. They are responsible for generating contextual responses to the questions asked. Given below is the list of top 10 models that’s used in the industry currently.

  • GPT 4 by Open AI
  • PaLM 2 by Google AI
  • Claude v1 by Antropic AI
  • LlaMA by Meta AI
  • Mistral by Facebook AI Research
  • Jurrasic 1 by AI21 Labs
  • Flan-T5 — Google and Open Source
  • Megatron-Turing NLG by NVIDIA and Microsoft
  • Cohere by Cohere AI
  • BLOOM by Big Science

Important Indexes

To create the knowledge repository for RAG, indexes are used. We will talk about them in more detail later but lets list some of them.

  • FAISS — Vector DB
  • Quadrant — Vector DB
  • Pinecone — Vector DB
  • Chroma — Vector DB
  • Neo4J — Knowledge Graph
  • Azure Search Index
  • Amazon Kendra

As an overview, we chunk the document and create embeddings of them. These embeddings are stored in one of the above services showed above. Then we use different similarity measures to get the answers.

This finishes our first day discussion of RAG. Tomorrow we will look at the core components of RAG. We will look at chunking and embeddings, what we mean by prompts, what are vector indexes, different LLM frameworks, etc.

--

--

Himanshu Singh

ML Consultant, Researcher, Founder, Author, Trainer, Speaker, Story-teller Connect with me on LinkedIn: https://www.linkedin.com/in/himanshu-singh-2264a350/