RAG and Long-Context Windows: Why You need Both

Combining RAG and Long-Context Windows achieves performance at a lower cost

Allan Alfonso
Google Cloud - Community
4 min readNov 8, 2024

--

How to build Multimodal Retrieval-Augmented Generation (RAG) with Gemini

How can AI analyze my data if the model is not trained on my data?

If an AI model is not trained with your data, then the model does not know about your data. Training a model is time consuming and expensive but there are ways to ground a model on your data without training it. Two common approaches are Retrieval Augmented Generation (RAG) and Long Context (LC) Windows.

What are these two approaches?

Retrieval Augmented Generation (RAG)

Source: https://www.cloudskillsboost.google/paths/17/course_templates/1120/documents/507354

To understand RAG, break down the acronym.

Retrieval: Search and retrieve data from an trusted knowledge base such as a database, shared drive, or an internal wiki.

Augmented: Augment the AI model with this data.

Generation: The AI model generates a response using the data you provided.

RAG combines traditional information retrieval systems with AI models so your data can be analyzed.

Long Context Windows

Long Context Windows allow for more information to be sent to the model

Another way to augment a model with your data is to provide the data in your prompt.

The bigger a model’s context window, the more information (context) you can send in a prompt. Currently, models with 1–2M token Context Windows, such as Gemini, are considered long. With Long Context Windows, you send your data directly to the model so the model can analyze it without having to build a RAG solution.

If we have two approaches that do the same thing, which one do you choose?

RAG vs Long Context Large Language Models (LLMs)?

RAG vs LC LLMs Study

There was a recent study that compared RAG and Long Context Large Language Models.

Long Context Strengths

  • Long Context consistently outperformed RAG in almost all settings.

Long Context Weaknesses

  • There is a maximum amount of data that can be passed to the model so there could be scalability issues.
  • Higher costs to analyze more data.

RAG Strengths

  • RAG doesn't have the same ceilings as Long Context.
  • Lower costs since RAG can send a subset of your data.

RAG Weaknesses

  • RAG had generally lower performance than Long Context.

The study concluded:

While LC demonstrate superior performance in long-context understanding, RAG remains a viable option due to its lower cost and advantages when the input considerably exceeds the model’s context window size.

Interestingly, the study found that predictions from LC and RAG were 60% identical.

This finding led the researchers to recommend RAG for the majority of queries and use Long Context for a small subset of queries where performance is required. This hybrid approach would offer a balance between performance and cost.

RAG and Long Context complement instead of compete with each other.

Summary

A Medieval RAG System

RAG and Long Context Windows augment a model with your data.

Long Context offers better performance (as measured by the researchers) while RAG offers lower cost. A suggested strategy is to use RAG for most queries and to use Long Context for a subset of queries where performance is required.

The two approaches complement each other.

Gemini Long Context Competition

On the subject of long context, Google is sponsoring a competition to find new novel use cases for Gemini 1.5’s long context window.

This competition is an open-ended call-to-action to share public Kaggle Notebooks and YouTube Videos demonstrating interesting use cases for Gemini 1.5’s long context window. There are Four prizes of $25,000 to the top 4 teams. Details are on the Kaggle website.

The contest ends on December 1, 2024.

Resources

--

--

Google Cloud - Community
Google Cloud - Community

Published in Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Allan Alfonso
Allan Alfonso