RAG or Retrieval Augmented Generation Simplified

9 min readMar 13, 2024

RAG is an acronym that you have been hearing of quite a bit and it is often something that you might have been hearing used along with terms such as Generative AI and ChatGPt. So, what is RAG and when do we use it?

RAG needs to be understood in the context of the larger space of LLMs where it is applied. To understand what is RAG, we need to understand at a high level what is an LLM such as Chatgpt and how it functions. And then we need to identify what is the gap that RAG comes in to bridge.

Why is corporate uptake of LLMs slower than expected?

There is an interesting dichotomy that has been observed in the uptake of LLMs such as ChatGPT or Gemini or Claude. People, as in the general internet audience, have really embraced these apps with many having made it a core part of their lives. On the other hand, corporate usage of these models has been much slower. There are only a few companies that explicitly have launched an LLM as a public facing application. Even those that have, are mainly tech oriented or internet based companies. For example, Microsoft has integrated ChatGPT 4 into its search engine Bing. Language education specialist Duolingo is also using ChatGPT powered apps. Online education platform Udacity is using ChatGPT to power an online tutor that provides provide personalized guidance and feedback. Grammerly and Quora are other examples of early adopters. According to a BCG report most companies are carrying out cautious experiments or pilots.

However, given the impressive linguistic understanding and text generating capabilities of ChatGPT or other LLMs, we would have expected corporate usage to have been far more aggressive. Why the slow pace?

There are several reasons for this. let’s deep dive into them. First, what are the typical use cases for these models?

Chatbots and Virtual Assistants: this is probably the most winning use case from a corporate perspective. A 24x7 chatbot application that can understand customer or employee queries and then answer them appropriately is something that would hugely improve customer experience and it would also probably be a lot cheaper than a call center!
Personalized marketing: This has been the holy grail of marketing for many years and has been difficult to achieve. But for a marketer it is an enticing prospect to be able to able to customize emails, website content and offers to customers according to their needs or search histories. With LLMs being able to generate automated content, this is looking far more achievable.
Content creation, particularly marketing content such as emails, social media posts, etc.

These are just three, there are actually many more, particularly in internal process optimization. However, from a customer facing perspective, these are probably the most important and impactful.

Since these are customer facing applications, there is a need for absolute accuracy and consistency in the outputs.

Key issues in behind lower corporate usage of LLMs

What are the key issues that a corporate might face when deploying an LLM such as ChatGPT, particularly in a customer facing role?

Broadly, we can bucket the possible issues into 2 buckets:

Data quality issues:

ChatGPT and Gemini are giant models that are trained on hundreds of GB of data. This is all public data coming from the common crawl, reddit forums, Wikipedia, etc. According to some estimates, ChatGPT was trained on approx. 570 GB of data. The size of the model is approximately 175 billion parameters. This is a massive model and Gemini is even bigger.

As you can imagine, its impossible to check and curate a data set of this size for accuracy though there is an attempt to select and clean the data by both OpenAI and Google and any other company in the LLM business.

The problem with data quality issues is that it can lead to hallucinations or factually incorrect responses. This is a huge issue in any corporate application. No manager would like to be associated with an application that has some non trivial probability of giving an incorrect response!

2. Data Privacy:

A challenge for businesses looking to use ChatGPT for their processes is that of how to make the model use and respond basis of data which is specific to the product or the company and which is private. They also want to ensure that their data, a very valuable asset for companies, remains private and is not inadvertently shared out by the model via its parameters.

How would a deployment of a chatbot using an LLM work?

Let’s take the case of a chatbot and walk though the process of how it would work and what are the likely problems.

HR chatbot deployment (source: RAG Explained, AIeconomics)

This slide from my youtube video, “ RETRIEVAL AUGMENTED GENERATION OR RAG SIMPLIFIED FOR Management” explains how a simple HR chatbot can be deployed powered by an LLM such as ChatGPT. An HR chatbot is a simple application that most companies would benefit from. It would provide clarity to employees on rules and regulations as and when they need the information, rather than waiting for busy HR executives to get back to them.

Now, suppose we have a query such as “how may days of privilege leave do I have”. As the image shows, this query is typed into the HR chatbot. the HR chatbot then probably connects with the LLM via an API. The LLM understands the query and generates an appropriate answer. Now, the answer generated by the LLM will be based on the kind of information it has been trained on. For example, it may have seen a figure such as 40 days in a reddit forum and another number such as 37 days in Wikipedia. Anyway, based on these types of information essentially, it will generate a response. For example the response may be 38 days and this happens to be incorrect. This will immediately lead to a loss of trust on part of users on the LLMs responses. Clearly deploying an LLM in this way is too risky for many managers to take up.

What are the workarounds on this?

There are a few ways that we can get around this problem. The first one is :Fine tuning

What is finetuning?

One way to improve response quality of the model is to train the model on our HR policy documents. This is known as fine tuning and is very expensive and time consuming. Also it means your companies private information is getting encoded into ChatGPT. No way to unencode right now!

Essentially, what we are doing is taking the last few layers of the LLM and retraining them on your specific corpus. In this case it would be the company HR policy documents.

So, how does finetuning work? Basically the earlier layers in an LLM capture the general understanding of language acquired through the massive pre-training process. The final layers are responsible for the model’s specific outputs and decisions. Retraining these layers allows the LLM to adapt its understanding and responses to the new task or domain.

The main benefits of finetuning are that it leverages the pre-trained language knowledge of the LLM as well as adding on the more domain specific knowledge. So, now the LLM is able to generate responses that are more relevant to your company. Further, since finetuning is only done for the last few layers, the LLM can effectively learn the nuances of the new task as these layers tend to be more specific to the tasks that the LLM will be used for.

Finetuning is a slow and expensive process and requires significant computational power. It also requires a fairly expert team to carry out the process. This definitely adds on expensive manpower to the cost equation.

Also, what happens when information changes? You, go back and repeat the whole expensive and slow process again?

To get around all this we have the next solution, or RAG.

How does Retrieval Augmented Generation or RAG work?

Let’s look at how RAG works visually. The slide below illustrates the process very simply. We start with our document corpus. This is preprocessed using the same type of text preprocessing that is done for creating a language model. So, this includes steps such as tokenization. They are then converted to numerical vectors that capture the meaning and relationships within the information. These are then stored in a vector database.

When a user submits a query, for example, “how many days of PL do I have?”. The RAG system then leverages the vector database. The system performs a similarity search within the database. It compares the user’s query vector (also converted to a numerical representation) with the stored document vectors. It then identifies and retrieves the most similar documents (based on the vector comparison). Essentially, all documents or information relating to PLs would be retrieved.

Then both the query and the retrieved documents are sent to the LLM for a response. The retrieved documents act as additional context to the query. The LLM understands the query using its linguistic capabilities. It also uses the context documents for further enhancing the relevance of the answer generated.

RAG Strengths and Limitations

RAG (Retrieval-Augmented Generation) shows promising effectiveness in providing relevant responses. However it is an evolving technology and has both strengths and limitations.

Strengths:

RAG integrates information retrieval from a dedicated knowledge base, reducing the risk of LLMs offering purely fabricated information or responses irrelevant to the user’s query. This injects a layer of factual grounding.
By leveraging retrieved information, RAG allows the LLM to consider the broader context of the user’s request. This leads to more comprehensive and informative responses that address the user’s intent more effectively.
Domain Expertise: When the knowledge base is tailored to a specific domain (e.g., legal documents for a law firm), RAG equips the LLM with relevant information and terminology, improving the quality and accuracy of its responses within that domain.

Evidence of Effectiveness:

However it is important to note that this technology is SOTA and is developing as I write. It looks promising, however research is ongoing on its effectiveness. Some preliminary research does suggest that RAG can significantly outperform traditional LLM outputs in terms of relevance and factual accuracy.

Limitations of RAG

RAG heavily relies on the quality, accuracy, and comprehensiveness of the information stored within the knowledge base. Incomplete or inaccurate information can lead to misleading or irrelevant retrieved data
While RAG improves factual grounding, LLMs still lack true reasoning capabilities. They retrieve information and are able to respond well to simpler queries. However LLMs are less effective with complex queries requiring deeper understanding or analysis

The success of RAG hinges on its effective integration with existing workflows and ensuring that the LLM and knowledge base is relevant to the types of queries that it will receive.

How is RAG implemented?

There is no one single RAG library for organizational use. There are multiple players out there:

Opensource

·LLMware: basic piloting, user-friendliness, active open source community. This is definitely the go to library if the organization is looking to experiment with RAG. It is good for small pilots. The library is easy to use and does not require specialized skills

·Verba and Neum: advanced users, efficient retrieval and scalability for handling large datasets or requiring performance optimization. These libraries definitely require heavy lifting. They are relevant for larger organizations with more data and looking for more optimized solutions. They also require people with stronger NLP and coding skills.

Commercial Solutions:

·Pinecone: cloud-based approach simplifies deployment and management, making it attractive for organizations seeking a managed solution and avoiding infrastructure concerns

· Cohere: This platform’s pre-trained models and tools are designed for tasks like question answering or text generation offers off-the-shelf modules

Both the commercial solutions are much more ‘plug and play’. Therefore they reduce the need for trial and error in deployment. On the con side these solutions can be expensive if used at a large scale.

If you liked this article, check out my youtube video which provides a simple and intuitive explanation of RAG for management folk. Also please subscribe to my youtube channel AIeconomics which essentially aims at decoding the latest developments in AI in terms of their impact on business and applications.

Follow me on medium for more such articles and tutorials on AI and applications of AI.

RAG or Retrieval Augmented Generation Simplified

Written by Shailey Dash