For More Context, Embrace RAG

7 min readFeb 11, 2024

https://www.linkedin.com/pulse/more-context-embrace-rag-alexander-stahl-qezof/

Have you ever received an answer from ChatGPT that sounded plausible but was completely wrong? There were probably two reasons for this:

1. the LLM was not up to date

2. it had no context to the topic and was therefore “hallucinating”

How can you solve this problem now without “building your own LLM” (please dig deep into your pocket), fine-tuning an LLM, or supercharging your prompting, which won’t be enough though?

Let’s first evaluate the common challenges we encounter with LLMs.

The Problem with LLMs

As groundbreaking as LLMs are, they do have some limitations:

They’re limited by the amount of training data they have access to. For example, GPT-4 has a cutoff date for its training data, which means it doesn’t have access to information beyond that date. This limitation affects the model’s ability to generate timely and accurate responses.
They’re generic and lack domain expertise. LLMs are trained on a large dataset that covers a wide range of topics, but they don’t have expertise in any particular area. This leads to hallucinations or inaccurate information when asked about specific subject areas.
Citations are difficult. LLMs don’t have a reliable way of returning the exact location in the text where they found the information. This exacerbates the problem of hallucination, as they may not be able to provide proper attribution or verify the accuracy of their answers. In addition, the lack of specific citations makes it difficult for users to fact-check or delve deeper into the information provided by the models.

Creating your own foundation model may not be a good idea.

According to OpenAI’s Sam Altman, it cost approximately $100 million to train the foundation model behind ChatGPT.

While not every company or model will require such a significant investment, ChatGPT’s price tag highlights the challenge of producing sophisticated models with today’s techniques.

In addition to the cost of computing, you will also face the challenge of finding specialized teams of machine learning PhDs, top-notch systems engineers, and highly skilled operations personnel to tackle the many technical challenges of producing such a model. Every other AI company in the world is also competing for the same rare talent.

Another challenge is obtaining, sanitizing, and labeling the datasets required to produce a capable foundation model. For instance, if you are a legal discovery company planning to train your model to answer legal document-related questions, you will require legal experts to spend several hours labeling training data.

Even with sufficient capital, the right team, adequate datasets, and overcoming technical hurdles to host your model in production, success is not guaranteed. The AI industry has witnessed numerous ambitious startups that have failed, and we anticipate more to come.

Fine-tuning is an outdated method of improving LLM outputs

Fine-tuning is a great way to retrain a foundation model on new data without having to build a model from scratch. However, it still requires rare expertise and sufficient data, and hosting the model in production remains technically complex and costly.

It is important to note that fine-tuning may not be practical now that LLMs can be paired with vector databases for context retrieval. Some LLM providers, like OpenAI, have stopped supporting fine-tuning for their latest-generation models.

This is because it requires recurring, costly, and time-intensive labeling work by subject-matter experts, as well as constant monitoring for quality drift and undesirable deviations in model accuracy due to a lack of regular updates or changes in data distribution.

If your data changes, even a well-tuned model’s accuracy can decrease. This may require more expensive and time-consuming data labeling, continuous quality monitoring, and repeated fine-tuning.

Prompt engineering alone is not enough to reduce hallucinations.

Testing and adjusting the instructions given to the model is a cost-effective way to improve the accuracy of your GenAI application. It is important to note that this method may not always be sufficient with regards to reducing hallucinations. Therefore, it is recommended to explore other methods as well.

However, there are other methods that can be explored to achieve the desired outcome. Testing and adjusting the instructions given to the model is a cost-effective way to improve the accuracy of your GenAI application.

It’s great that your LLMs’ responses are more accurate now! However, it’s important to keep in mind that they still lack the ability to understand new or changing contexts. As a result, your GenAI application may generate irrelevant responses.

For More Context, Embrace RAG

Researchers at Meta Facebook published a paper on a technique called Retrieval Augmented Generation (RAG), which adds an information retrieval component to the text generation model that LLMs are already good at. This allows for fine-tuning and adaptation of the LLM’s internal knowledge, making it more like a machine.

Retrieval-augmented generation (RAG) is a framework designed to enhance the accuracy and relevance of large language models (LLMs) by combining retrieval of relevant content with generative text response.

RAG tackles two common issues with LLMs: outdated information and lack of proper sourcing. It ensures responses are grounded in up-to-date information from reliable sources, reducing the likelihood of misinformation.

In the RAG framework, the LLM first retrieves relevant content from a designated content store based on the user’s query. It then combines this retrieved information with the user’s question to generate a response, thereby providing evidence for its answer.

RAG facilitates easy updates of information without the need for model retraining, ensuring responses remain current. It also emphasizes the importance of sourcing information from reliable primary sources, enhancing credibility and reducing the risk of misinformation.

https://aws.amazon.com/de/what-is/retrieval-augmented-generation/

This is how RAG works

RAG pairs a pre-trained system that finds relevant information (the retriever) with another system that generates text (the generator).
Then, when the user enters a question (query), the retriever uses a technique called Maximum Inner Product Search (MIPS) to find the most relevant documents.
The information from these documents is then fed into the generator to create the final answer. This also allows for citations, which allows the end user to verify the sources and delve deeper into the information provided.

Think of RAG as a powerful search engine with a built-in content writer.

The retriever in RAG is like a database index. When you enter a query, it doesn’t scan the entire database (or in this case, the corpus of documents). Instead, it uses a technique similar to a B-tree or hash index to quickly find the most relevant documents. This is similar to how an index in a database allows for efficient retrieval of records without scanning the entire database.

Once the retriever has found the relevant documents, it’s like retrieving raw data from a database. But raw data isn’t always useful or easy to understand. That’s where the generator comes in. It’s like a built-in application layer that takes the raw data and transforms it into a user-friendly format. In this case, it generates a coherent and contextually relevant answer to the query.

Finally, the citations are like metadata about the source of the data, allowing for traceability and further exploration if needed. In this way, RAG is like a search engine with a built-in content writer that provides efficient, relevant, and user-friendly answers.

Benefits of using rag in business applications

RAG is industry and domain-agnostic, making it a versatile solution for businesses across various sectors.

It can help reduce the need for expensive fine-tuning (as mentioned above) of LLMs, as they can access the necessary information dynamically.

RAG can facilitate the efficient sharing of internal business knowledge and insights among teams, enhancing collaboration, and enabling access to many external data sources, facilitating more in-depth and efficient research processes.

I personally use RAG for my applications because it can provide accurate, context-specific answers swiftly, making it a cost-effective and hassle-free approach compared to building models from scratch or fine-tuning.

Applications of RAG in Business

1. Customer Service: RAG can be used to provide customer service representatives with quick, relevant, and comprehensive answers to complex questions about products, thereby enhancing the customer experience.

2. Legal Industry: In the legal industry, RAG can be utilized for tasks such as automating legal document analysis and research.

3. Healthcare: RAG can assist in retrieving and generating relevant information for healthcare professionals, potentially improving decision-making and patient care.

4. Manufacturing: In manufacturing, RAG can be employed for tasks such as automating quality control processes and providing relevant information for maintenance and repair activities.

5. Finance: RAG can be used in finance for tasks such as automating data retrieval for risk assessment and providing relevant information for customer inquiries.

Do you already know how you would apply RAG in the context of your business processes?

The best way to get started in your own company is by analyzing your potential for AI-driven solutions based on your data.

I’m offering business owners a free 30-minute consultation to understand how AI-based solutions can lead to more effectiveness, cost savings, and profit maximization in your industry.

Take advantage of this opportunity and book a consultation with Simple AI : https://calendly.com/alexanderstahl/30min?back=1&month=2023-10

Thank you for taking the time to read this article and I look forward to your feedback.

x Alexander Stahl

For More Context, Embrace RAG

The Problem with LLMs

Creating your own foundation model may not be a good idea.

Fine-tuning is an outdated method of improving LLM outputs

Prompt engineering alone is not enough to reduce hallucinations.

For More Context, Embrace RAG

This is how RAG works

Benefits of using rag in business applications

Applications of RAG in Business

Written by Alexander Stahl