Retrieval-Augmented Generation (RAG) Simplified!

Rishi
6 min readJan 10, 2024

--

In this article, we will talk about how Retrieval Augmented Generation (RAG) works and can be used in enterprise use cases, specifically with Large Language Models (LLMs). We’ll also talk about the difference between naive and complex RAG and help you figure out which one you might need.

Introduction

RAG is the biggest use case of LLMs, and it’s super important to know what it is and how it can work for you. RAG is a way to make LLMs work even better by giving them more information to work with. It’s like when you’re trying to answer a question, and you need to look up some facts first.

With RAG, you can add more stuff to your answer, making it even better. In 2023, LLMs became even more popular in enterprise applications, especially for RAG and information retrieval.

How does RAG work?

So, RAG is a pretty cool process that uses a Large Language Model, a bunch of enterprise documents, and some fancy tech stuff to help you find information and answers to your questions more easily.

Basically, RAG searches through a database (commonly Vector Databases) to find stuff that’s kinda like what you’re asking about, pulls out the relevant info, passes it to language models, and then gives you an answer that’s just right for your question. This makes RAG perfect for businesses that want to get better at using all the data they already have, to make smarter decisions and find the info they need faster.

Source: https://gradientflow.com/techniques-challenges-and-future-of-augmented-language-models/

In the legal and healthcare sectors, it helps in referencing precise information from vast databases of case law, research papers, and clinical guidelines, facilitating informed decision-making.

In customer service, RAG is used to power sophisticated chatbots and virtual assistants, providing accurate and contextually relevant responses to user queries. RAG is also great for customer service because it can find product information or company policies quickly.

RAG is also pivotal in content creation and recommendation systems, where it helps in generating personalized content and recommendations by understanding user preferences and historical data.

A good example of a large production RAG implementation is Twitter/X’s See Similar Posts function, where the RAG system would chunk and store tweets in a vector database. When a user clicks on See Similar Posts, a query retrieves similar tweets and passes them to an LLM to determine which posts are most similar to the original.

RAG is useful for organizing messy internal documents, which is great because most companies have terrible document storage systems. Also, RAG is a much more flexible approach than traditional techniques like keyword searching, which does not account for similarity, meaning, sentiment, and misspellings, among others.

RAG vs Finetuning

There are two ways to use Large Language Models for chatting with data: RAG and fine-tuning. You can use both of them, but it depends on your needs and available resources.

Source: https://www.rungalileo.io/blog/optimizing-llm-performance-rag-vs-finetune-vs-both

Fine-tuning is when you train a pre-existing LLM with your company’s data. It’s pretty good at recognizing tone and generating content that aligns with specific guidelines or linguistic styles. The only downside is that it’s expensive and takes a long time to set up.

On the other hand, RAG pulls data from externally stored company documents and feeds it to an LLM to guide response generation. It’s better for situations where you need to fetch up-to-date data quickly, like in legal, customer service, or financial services.

Type of RAGs

When you’re thinking about creating a RAG system for your organization, the first thing to consider is the types of questions that come up in your workflow and data. There are two main types of RAG systems: simple and complex.

Simple RAG systems are great for answering straightforward questions that only require direct answers. Imagine a customer service bot responding to a question like “What are your business hours?” It can easily retrieve a single piece of information in just one step!

But for more complex queries, you’ll need a complex RAG system. These use a multi-hop retrieval process to extract and combine information from multiple sources. This is particularly useful for answering complicated questions that require linking different pieces of information from various documents. With a multi-hop process, RAG systems can provide you with comprehensive answers by synthesizing information from interconnected data points.

For example, if you’re using a medical research assistant tool and you ask it “What are the latest treatments for Diabetes and their side effects?” the system will need to find all the latest treatments from one data source or document and then search another document in the database to get details about their side effects. We hope this helps you decide which RAG system is best for you!

Source: https://www.researchgate.net/figure/Answering-a-multi-hop-question-over-the-relation-graph-The-relations-are-constrained_fig1_350892243

The picture above shows a multi-hop reasoning system that can answer complex questions by breaking them down into smaller, more manageable questions. For instance, if you want to know who Bill Gates’ wife is and the organizations she founded, the system can answer those questions one by one.

In another example, imagine using a legal research tool and asking, “What are the effects of new employment laws on remote work policies?” The RAG system retrieves the most up-to-date employment law updates and then hops to find the latest remote work guidelines. This detailed retrieval process allows the system to provide you with a legally relevant answer that considers both the latest employment law updates and remote working policies.

Reasoning and multi-hop retrieval have always been important factors in the question-and-answer space, and as RAG becomes more popular, we can expect to see even more advanced solutions in this area. The potential for such systems is immense, and we can look forward to more sophisticated systems that can help us answer even more complex questions in the future.

So what questions should you ask yourself?

When you’re building your first RAG system, you gotta be super specific about the workflow you wanna automate.

So, first things first, figure out the types of questions your target user group will ask and where the information is stored.

But here’s the thing: some industries and workflows keep their answers separated in different documents. For example, in the legal world, contracts can be split into many sub-documents that reference each other. This means that many legal questions require multi-hop reasoning, which can be a pain to organize.

And let’s not forget that what seems like a simple question might require multi-hop reasoning.
For instance, if someone asks about business hours for a store on public holidays, the info needed to answer that question might not be in the same document.

One more thing to keep in mind is that people don’t always ask questions the right way. They forget to add context or use poorly phrased queries that require multi-hop reasoning. To tackle these challenges, you might wanna consider building multi-hop capable RAG systems from the get-go. This will help you handle the range of questions, data sources, and use cases that will come up as you automate more and more complex workflows using LLMs and RAG.

That’s a wrap! We’ll dive deeper into complex RAG systems in the next blog post. Stay tuned for a more in-depth exploration!

--

--