Getting started with RAG

Learn about the methods available to ground LLMs with facts

Published in

NeuML

9 min readJun 22, 2024

Large Language Models (LLMs) have captured the public’s attention with their impressive capabilities. The Generative AI era has reached a fever pitch with some predicting the coming rise of superintelligence.

LLMs are far from perfect though and we’re still a ways away from true AI. The biggest challenge is with hallucinations. Hallucinations is the term for when a LLM generates output that is factually incorrect. The alarming part of this is that on a cursory glance, it actually sounds like factual content. The default behavior of LLMs is to produce plausible answers even when no plausible answer exists. LLMs are not great at saying I don’t know.

Retrieval Augmented Generation (RAG) helps reduce the risk of hallucinations by limiting the context in which a LLM can generate answers. This is typically done with a search query that hydrates a prompt with a relevant context. RAG has been one of the most practical use cases of the Generative AI era.

This article gives an overview of popular RAG methods currently available. Examples are backed by txtai, an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows. See the links below for more on txtai.

Introducing txtai, the all-in-one embeddings database

Add Natural Language Understanding to any application

medium.com

GitHub - neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM…

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows …

github.com

RAG Pipeline Overview

RAG is really quite a simple concept. It’s just plugging knowledge into a LLM prompt. Frameworks shouldn’t over-engineer this and it should be straightforward.

The example below loads a Wikipedia Embeddings index and uses it for RAG.

from txtai import Embeddings, LLM

# Load Wikipedia Embeddings database
embeddings = Embeddings()
embeddings.load(provider="huggingface-hub", container="neuml/txtai-wikipedia")

# Create LLM
llm = LLM("TheBloke/Mistral-7B-OpenOrca-AWQ")

# Prompt template
prompt = """<|im_start|>system
You are a friendly assistant. You answer questions from users.<|im_end|>
<|im_start|>user
Answer the following question using only the context below. Only include
information specifically discussed.

question: {question}
context: {context} <|im_end|>
<|im_start|>assistant
"""

# Generate context
question = "How do you make beer?"
context = "\n".join([x["text"] for x in embeddings.search(question)])

# Run RAG
llm(prompt.format(question=question, context=context))

The code above generates the following:

To make beer, you need to steep a starch source, such as malted cereal grains (commonly barley), in water. This process creates a sweet liquid called wort. Then, yeast is added to the wort, which ferments the liquid and produces alcohol and carbon dioxide. The beer is then aged, filtered, and packaged for consumption. This process has been used since around the 6th millennium BC and has been a part of most western economies since the 19th century.

As we can see, all RAG does is combine the results of a search query (in this case a vector search but it can be any search) and plugs that into an LLM prompt.

RAG made simple

The previous example was verbose to illustrate how a RAG pipeline works. txtai has a defined RAG pipeline that makes this even easier.

from txtai import RAG

# Create RAG pipeline using existing components. LLM parameter can
# also be a model path.
rag = RAG(embeddings, llm, template=prompt)

rag("How do you make wine?", maxlength=2048)["answer"]

To make wine, follow these steps:
1. Select the fruit: Choose high-quality grapes or other fruit for wine production.
2. Fermentation: Introduce yeast to the fruit, which will consume the sugar present in the juice and convert it into ethanol and carbon dioxide.
3. Monitor temperature and oxygen levels: Control the temperature and speed of fermentation, as well as the levels of oxygen present in the must at the start of fermentation.
4. Primary fermentation: This stage lasts from 5 to 14 days, during which the yeast consumes the sugar and produces alcohol and carbon dioxide.
5. Secondary fermentation (optional): If desired, allow the wine to undergo a secondary fermentation, which can last another 5 to 10 days.
6. Fermentation location: Choose the appropriate fermentation vessel, such as stainless steel tanks, open wooden vats, wine barrels, or wine bottles for sparkling wines.
7. Bottle and age the wine: Transfer the finished wine into bottles and allow it to age, if desired, to develop flavors and complexity.
Remember that wine can be made from various fruits, but grapes are most commonly used, and the term “wine” generally refers to grape wine when used without a qualifier.

As we can see, much less is needed than the previous example.

Check out the following articles for additional ways to run RAG with txtai. These articles cover how to setup RAG pipelines as API services with Docker, making it even simpler to get started.

Build RAG pipelines with txtai

Guide on retrieval augmented generation including how to create citations

neuml.hashnode.dev

How RAG with txtai works

Create RAG processes, API services and Docker instances

neuml.hashnode.dev

Graph RAG

A 🔥 topic right now is using knowledge graphs to power RAG pipelines. The idea behind this is that a knowledge graph can provide a deeper context to the LLM prompt.

Instead of running a vector query, we’ll define a graph path traversal. The example below uses openCypher to define the query. This query pulls in multiple concepts as a context.

Note: There are models for creating openCypher queries from natural language.

g = embeddings.graph.search("""
MATCH P=({id: "Roman Empire"})-[*1..3]->({id: "Saxons"})-[*1..3]->
({id: "Vikings"})-[*1..3]->({id: "Battle of Hastings"})
RETURN P
LIMIT 20
""", graph=True)

We’ll pass this rich context to a LLM and use it to generate a short history book.

from txtai import LLM

llm = LLM("TheBloke/Mistral-7B-OpenOrca-AWQ")

def rag(question, text):
    prompt = f"""<|im_start|>system
    You are a friendly assistant. You answer questions from users.<|im_end|>
    <|im_start|>user
    Answer the following question using only the context below. Only include
    information specifically discussed.

    question: {question}
    context: {text} <|im_end|>
    <|im_start|>assistant
    """

    return llm(prompt, maxlength=4096)

context = "\n".join(g.attribute(node, "text") for node in list(g.scan()))

rag("""Write a book covering the end of Roman Britain, Saxons, Vikings and
Norman conquest of England""", context))

Very similar to the vector based RAG shown above. Except this time, the entire graph of search results is used as the context.

Title: “The Fall and Rise of Britain: From Roman Rule to the Norman Conquest”
Chapter 1: The Fall of the Western Roman Empire The Western Roman Empire fell due to a combination of factors, including the effectiveness and numbers of the army, the health and numbers of the Roman population, the strength of the economy, the competence of the emperors, internal struggles for power, religious changes, and the efficiency of the civil administration. Barbarian invasions and climatic changes also played a significant role in the collapse.
Chapter 2: Roman Britain Roman Britain was the territory that became the Roman province of Britannia after the Roman conquest of Britain, consisting of a large part of the island of Great Britain. The occupation lasted from AD 43 to AD 410.
Chapter 3: The End of Roman Rule in Britain Roman rule in Britain ended in different parts of the country at different times and under different circumstances. The recall of Roman troops to Gaul by Constantine III in 407 left Britain vulnerable to barbarian attacks. Around 410, the Romano-British expelled the Roman magistrates from Britain, leading to the fall of Roman rule.
Chapter 4: Sub-Roman Britain Sub-Roman Britain is the period of late antiquity in Great Britain between the end of Roman rule and the Anglo-Saxon settlement. This period saw the decay of locally made wares from a previous higher standard under the Roman Empire.
Chapter 5: The Saxon Settlement The Saxons were a group of Germanic peoples who played a major role in the fall of the Western Roman Empire and the establishment of the post-Roman kingdoms. They settled in Britain, contributing to the decline of Romano-British culture and the rise of a new Germanic culture.
Chapter 6: The Viking Age The Viking Age was a period during the Middle Ages when Norsemen, known as Vikings, undertook large-scale raiding, colonizing, conquest, and trading throughout Europe. They also voyaged as far as the Mediterranean, North Africa, the Middle East, Greenland, and North America.
Chapter 7: The Norman Conquest of England The Battle of Hastings, fought on 14 October 1066, marked the beginning of the Norman Conquest of England. The Normans, led by William, Duke of Normandy, defeated the English army under King Harold Godwinson, leading to a decisive Norman victory and the end of Anglo-Saxon rule in England.
Chapter 8: The Impact of the Norman Conquest on England The Norman Conquest of England had a profound impact on the country’s culture, language, and governance. The Normans introduced feudalism, the use of the Norman French language, and a new system of land ownership and administration.
Chapter 9: The Legacy of the Saxons, Vikings, and Normans on Modern Britain The Saxons, Vikings, and Normans left a lasting legacy on modern Britain, shaping its language, culture, and political landscape. Their influence can still be seen in modern British society, language, and institutions.

Quite interesting! All from a knowledge graph. More details on this method can be found in the articles below.

Advanced RAG with graph path traversal

Graph path traversal to collect complex sets of data for advanced RAG

neuml.hashnode.dev

Build knowledge graphs with LLM-driven entity extraction

Extract and build knowledge with LLMs and Knowledge Graphs

neuml.hashnode.dev

Structured RAG

What if we want our RAG process to generate answers as structured output? Well that is possible with great libraries such as Outlines.

Let’s give it a try!

from typing import List

from outlines.integrations.transformers import JSONPrefixAllowedTokens
from pydantic import BaseModel
from txtai import LLM

class Response(BaseModel):
    answers: List[str]
    citations: List[str]

# Define method that guides LLM generation
prefix_allowed_tokens_fn=JSONPrefixAllowedTokens(
    schema=Response,
    tokenizer_or_pipe=llm.generator.llm.pipeline.tokenizer,
    whitespace_pattern=r" ?"
)

# Define the LLM
llm = LLM("TheBloke/Mistral-7B-OpenOrca-AWQ")

def rag(question, text):
    prompt = f"""<|im_start|>system
    You are a friendly assistant. You answer questions from users.<|im_end|>
    <|im_start|>user
    Answer the following question using only the context below. Only
    include information specifically discussed.

    question: {question}
    context: {text} <|im_end|>
    <|im_start|>assistant
    """

    return llm(prompt, maxlength=4096,
    prefix_allowed_tokens_fn=prefix_allowed_tokens_fn)

# Manually generated context
context = """
England's terrain chiefly consists of low hills and plains, especially
in the centre and south. The Battle of Hastings was fought on
14 October 1066 between the Norman army of William, the Duke of Normandy,
and an English army under the Anglo-Saxon King Harold Godwinson
Bounded by the Atlantic Ocean on the east, Brazil has a coastline of
7,491 kilometers (4,655 mi). Spain pioneered the exploration of the
New World and the first circumnavigation of the globe. Christopher Columbus
lands in the Caribbean in 1492.
"""

print(rag("List the countries discussed", context))

{'answers': ['England', 'Brazil', 'Spain'],
 'citations': ["England's terrain chiefly consists of low hills and plains, especially in the centre and south.",
  'The Battle of Hastings was fought on 14 October 1066 between the Norman army of William, the Duke of Normandy, and an English army under the Anglo-Saxon King Harold Godwinson.',
  'Bounded by the Atlantic Ocean on the east, Brazil has a coastline of 7,491 kilometers (4,655 mi).',
  'Spain pioneered the exploration of the New World and the first circumnavigation of the globe.',
  'Christopher Columbus lands in the Caribbean in 1492.']}

As we can see, this RAG process returns data as structured JSON! This opens up a number of different ways to extract information with LLMs.

See the article below for a deeper dive into this concept.

Advanced RAG with guided generation

Retrieval Augmented and Guided Generation

neuml.hashnode.dev

Wrapping up

This article covered various ways to run retrieval augmented generation (RAG). The main advantage of RAG is to factually ground LLM prompts and prevent hallucinations. This space is rapidly evolving, stay tuned for new methods that will be available soon!

See the links below for RAG applications with Wikipedia and ArXiv packaged as Docker images.

Wikipedia | ArXiv

Getting started with RAG

Learn about the methods available to ground LLMs with facts

Introducing txtai, the all-in-one embeddings database

Add Natural Language Understanding to any application

GitHub - neuml/txtai: 💡 All-in-one open-source embeddings database for semantic search, LLM…

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows …

RAG Pipeline Overview

RAG made simple

Build RAG pipelines with txtai

Guide on retrieval augmented generation including how to create citations

How RAG with txtai works

Create RAG processes, API services and Docker instances

Graph RAG

Advanced RAG with graph path traversal

Graph path traversal to collect complex sets of data for advanced RAG

Build knowledge graphs with LLM-driven entity extraction

Extract and build knowledge with LLMs and Knowledge Graphs

Structured RAG

Advanced RAG with guided generation

Retrieval Augmented and Guided Generation

Wrapping up

Written by David Mezzetti