RAG and Long-Context Windows: Why You need Both
Combining RAG and Long-Context Windows achieves performance at a lower cost
How can AI analyze my data if the model is not trained on my data?
If an AI model is not trained with your data, then the model does not know about your data. Training a model is time consuming and expensive but there are ways to ground a model on your data without training it. Two common approaches are Retrieval Augmented Generation (RAG) and Long Context (LC) Windows.
What are these two approaches?
Retrieval Augmented Generation (RAG)
To understand RAG, break down the acronym.
Retrieval: Search and retrieve data from an trusted knowledge base such as a database, shared drive, or an internal wiki.
Augmented: Augment the AI model with this data.
Generation: The AI model generates a response using the data you provided.
RAG combines traditional information retrieval systems with AI models so your data can be analyzed.
Long Context Windows
Another way to augment a model with your data is to provide the data in your prompt.
The bigger a model’s context window, the more information (context) you can send in a prompt. Currently, models with 1–2M token Context Windows, such as Gemini, are considered long. With Long Context Windows, you send your data directly to the model so the model can analyze it without having to build a RAG solution.
If we have two approaches that do the same thing, which one do you choose?
RAG vs Long Context Large Language Models (LLMs)?
There was a recent study that compared RAG and Long Context Large Language Models.
Long Context Strengths
- Long Context consistently outperformed RAG in almost all settings.
Long Context Weaknesses
- There is a maximum amount of data that can be passed to the model so there could be scalability issues.
- Higher costs to analyze more data.
RAG Strengths
- RAG doesn't have the same ceilings as Long Context.
- Lower costs since RAG can send a subset of your data.
RAG Weaknesses
- RAG had generally lower performance than Long Context.
The study concluded:
While LC demonstrate superior performance in long-context understanding, RAG remains a viable option due to its lower cost and advantages when the input considerably exceeds the model’s context window size.
Interestingly, the study found that predictions from LC and RAG were 60% identical.
This finding led the researchers to recommend RAG for the majority of queries and use Long Context for a small subset of queries where performance is required. This hybrid approach would offer a balance between performance and cost.
RAG and Long Context complement instead of compete with each other.
Summary
RAG and Long Context Windows augment a model with your data.
Long Context offers better performance (as measured by the researchers) while RAG offers lower cost. A suggested strategy is to use RAG for most queries and to use Long Context for a subset of queries where performance is required.
The two approaches complement each other.
Gemini Long Context Competition
On the subject of long context, Google is sponsoring a competition to find new novel use cases for Gemini 1.5’s long context window.
This competition is an open-ended call-to-action to share public Kaggle Notebooks and YouTube Videos demonstrating interesting use cases for Gemini 1.5’s long context window. There are Four prizes of $25,000 to the top 4 teams. Details are on the Kaggle website.
The contest ends on December 1, 2024.
Resources
- Cloud Skills Boost: Retrieval Augmented Generation
- What is a long context window?
- Gemini 1.5 Flash Long Context Window
- Gemini 1.5 Pro 2M context window, code execution capabilities, and Gemma 2 are available today
- Gemini Pro
- Our next-generation model: Gemini 1.5
- Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach