Advanced RAG: Recursive Retrieval with llamaindex

Published in

Pondhouse Data

9 min readMay 13, 2024

With recursive retrieval, RAG can generate more coherent and contextually relevant responses. This guide introduces you to the concept of recursive retrieval and how to implement it with llamaindex.

When it comes to Retrieval Augmented Generation (RAG), the quality of both the created document index and the retrieval process is crucial for getting good and consistent answers based on your documentation. One especially challenging aspect is how to model relationships between text chunks of your documents.

As a quick reminder, in text-based RAG, documentation is first parsed into text, which is then divided into smaller chunks. We need to divide the full text into smaller portions, as LLMs have a maximum input length. Additionally, they are charged per token, so we want to keep the input as short as possible. These chunks are then indexed and used for retrieval during the generation process.

Herein lies the challenge: If we divide the text into smaller chunks of texts, then use these chunks for retrieval, how can we make sure to retrieve all relevant information, which might be scattered across multiple chunks? This challenge is even more pronounced when the text contains tables and complex structures — as tables mostly need different handling than flowing text. How to capture the relationship between a table and accompanying text?

That’s where recursive retrieval comes into play. Recursive retrieval allows RAG to generate more coherent and contextually relevant responses by recursively retrieving and incorporating relevant information from retrieved document nodes.

In this guide, we will introduce you to the concept of recursive retrieval and demonstrate it hands-on by using llamaindex.

Note: This guide is heavily influenced by this very good tutorial of llamaindex. We’ll add some additional context and explanation to make it more accessible — but full credit goes to the llamaindex team.

What is Recursive Retrieval?

To understand why recursive retrieval is such a powerful concept, let’s look at it in detail. During normal retrieval, we use the user query to find potentially relevant documents — required for our LLM model to answer the mentioned user query. This is mostly done by comparing the semantic meaning of the user query with the semantic meaning of the documents in our index. (This sentence is not 100% correct, as we simply compare embeddings of the query and the documents — which is not exactly the semantic meaning — but a good enough approximation for now).

the full texts of our source documents into smaller chunks, which we then When looking at how we create these documents, we can see that we divide index. This is done to make sure that we can retrieve relevant information from our documents, even if the full document is too long to be processed by our LLM model.

However, this approach has a downside: If the relevant information is spread across multiple chunks, we might not be able to retrieve all relevant information with a single retrieval. If we look at tables, for example, oftentimes the ‘semantic meaning’ of a table is not captured by the table itself, but by the text surrounding it.

Recursive retrieval solves this problem by recursively looking at not only the semantically most similar documents, but also document chunks which might be related to these documents. This way, we can make sure to capture all relevant information, even if it is spread across multiple chunks.

This means, recursive retrieval consists of two main components:

A way to identify relationships between document chunks
A way to recursively retrieve related document chunks

While there are multiple ways to implement recursive retrieval, we will focus on how to implement it with llamaindex, as it provides a proven implementation of recursive retrieval (and is great for RAG in general).

What is llamaindex?

Llamaindex is a Python or TypeScript library for building LLM applications in the area of “Context Augmentation” (which basically means RAG). It provides tools for indexing documents, retrieving relevant documents and document chunks, and for generating answers based on the retrieved documents.

More specifically, llamaindex provides these main components:

Data connectors to ingest existing data from their native source and format. These could be APIs, PDFs, SQL, and (much) more.
Data indexes to structure your data in intermediate representations that are easy and performant for LLMs to consume.
Engines provide natural language access to your data. For example:
Query engines are powerful retrieval interfaces for knowledge-augmented output.
Chat engines are conversational interfaces for multi-message, “back and forth” interactions with your data.
Data agents are LLM-powered knowledge workers augmented by tools, from simple helper functions to API integrations and more. ecosystem. This could be LangChain, Flask, Docker, ChatGPT, or many
Application integrations tie llamaindex back into the rest of your others.

More information about llamaindex can be found in their absolutely brilliant documentation.

How to Implement Recursive Retrieval with llamaindex

The main tools required to implement recursive retrieval with llamaindex are Data Indexes and Query Engines. Before getting stuck in theory, let's directly jump into a hands-on example.

Before getting started, you can download the example data from our website.

To use camelot to extract tables from PDFs, we first need the following system dependencies:

apt install ghostscript python3-tk
# or on macOS: brew install ghostscript tcl-tk

Then, we need to install llamaindex and its dependencies:

pip install llama-index
pip install pymupdf
pip install pandas
pip install opencv-python
pip install camelot-py
pip install ghostscript

Note: There was quite a big update from llamaindex 0.9 to 0.10. Best to remove the old version and then install the latest version.

Next, we can import the required libraries and define which OpenAI models we want to use. Change the OPENAI_API_KEY to your own API key.

import camelot
import os

from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import PandasQueryEngine
from llama_index.core.schema import IndexNode
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.readers.file import PyMuPDFReader
from llama_index.core import Settings
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import get_response_synthesizer

os.environ["OPENAI_API_KEY"] = "Your-api-key"

# Setup the OpenAI LLM
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Optionally, you can set up debug logging to see exactly which prompts llamaindex is sending to the LLM and which responses it gets back.

# Optional: Set up debug logging
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

Now we are ready to extract the text from our PDFs. Note that this does not load the tables in the file as really tabular data, but just as plain text. We’ll see how to handle tables better in the next step.

# Load the document
file = "./recursiveRetrieval/world_billionaires.pdf"
reader = PyMuPDFReader()
docs = reader.load(file)

The docs object now contains the text of the PDF as well as some metadata like page numbers. As you can see, llamaindex makes it really easy to load documents and extract the text from them.

As mentioned above, while we already got the text from the tables in the PDF, this method of simply parsing tables as text is not ideal and often misses important information in the tables. Mainly due to the fact that the PDF standard does not define a table as a specific object — it is just text and lines. Normal text parsers have a hard time extracting these information.

However, there is a tool called camelot which is specifically designed to recognize tables in PDFs and extract them as tabular data - like a pandas dataframe.

# Use camelot to get the tables from the pdf
tables = []
pages_to_extract_from = [3, 24]  # Define the PDF pages where tables are located
for page in pages_to_extract_from:
    cam_table = camelot.read_pdf(file, pages=str(page))

    # Get dataframe from camelot extracted tables
    table = cam_table[0].df

    # Rename columns with the first row and drop the first row
    table = (
        table.rename(columns=table.iloc[0]).drop(table.index[0]).reset_index(drop=True)
    )

    tables.append(table)

The above snippet extracts the tables from the PDF and stores them in a list of pandas dataframes. We can now use these dataframes to create a more structured representation of the tables in our index.

Ok, so far we have extracted the information from our source document

but how can we make them accessible for our LLM? Meaning — how can we search for relevant information during query time?

data and provides an interface to “connect” these data to an LLM. Using This is where llamaindex’ QueryEngine comes into play. It abstracts the our parsed documents and asking questions against them is as easy as the following lines of code:

# Define query engines over these tables
df_query_engines = [
    PandasQueryEngine(table_df, llm=Settings.llm) for table_df in tables
]

response = df_query_engines[0].query(
    "What's the net worth of the second richest billionaire in 2023?"
)
print(str(response))

There are multiple query engines for various data sources, like SQL, CSV, and more. The PandasQueryEngine is specifically designed to work with pandas dataframes. It works as follows:

During query time, the query engine sends the user query along with the df.head() to the LLM. The LLM is asked to return python code to answer the user's question.

This is quite powerful, as the LLM can therefore indirectly work with the data in the dataframe — without needing to see the whole dataframe.

As we know how to query the tabular data, we can now link these table data to the flow-text. For that, we are going to build a VectorStoreIndex which is a special index that can store and retrieve document chunks based on their semantic similarity. Before diving into the code, let's outline the strategy.

Llamaindex uses “Nodes” to represent the data in the index. These nodes can have relationships to other nodes. For example, a node representing the full text of a document can have relationships to nodes representing the tables in the document. Therefore, we can do something like this:

Create a node for each of the tables, with either a short description or — better — related text, so that we can retrieve them based on the user query.
Create nodes from the textual data of the PDF.
Combine the nodes of the tables and the nodes of the textual data into one index.

# Define index nodes for the tables
summaries = [
    (
        "This node provides information about the world's richest billionaires"
        " in 2023"
    ),
    (
        "This node provides information on the number of billionaires and"
        " their combined net worth from 2000 to 2023."
    ),
]

df_nodes = [
    IndexNode(text=summary, index_id=f"pandas{idx}")
    for idx, summary in enumerate(summaries)
]

df_id_query_engine_mapping = {
    f"pandas{idx}": df_query_engine
    for idx, df_query_engine in enumerate(df_query_engines)
}

# Construct top-level vector index + query engine
doc_nodes = Settings.node_parser.get_nodes_from_documents(docs)
vector_index = VectorStoreIndex(doc_nodes + df_nodes)
vector_retriever = vector_index.as_retriever(similarity_top_k=1)

Note: In the example above, we manually describe the table nodes. In a real-world scenario, you would probably want to extract this information automatically by sending parts of the tables to an LLM and asking it to describe the table. Or alternatively, use the table surrounding text to describe the table.

Now we have a VectorStoreIndex which contains the nodes of the tables and the nodes of the textual data. We can now use this index to create a RecursiveRetriever and a RetrieverQueryEngine to query the index. Using the latter, we again get a handy interface to ask questions via LLM.

recursive_retriever = RecursiveRetriever(
    "vector",
    retriever_dict={"vector": vector_retriever},
    query_engine_dict=df_id_query_engine_mapping,  # type: ignore
    verbose=True,
)

response_synthesizer = \
  get_response_synthesizer(response_mode="compact")  # type: ignore

query_engine = RetrieverQueryEngine.from_args(
    recursive_retriever, response_synthesizer=response_synthesizer
)

Note: In the example above, we use a response_synthesizer to make the response of the LLM nicer. This is optional and can be omitted. More information about the response_synthesizer can be found here

To use the interface, we just call the query method.

response = query_engine.query(
    "What's the net worth of the second richest billionaire in 2023?"
)

print(str(response))

And that’s it! We have now implemented recursive retrieval with llamaindex.

Conclusion

In conclusion, RAG enhanced by recursive retrieval and llamaindex offers a significant leap forward in how we approach information retrieval and generation tasks. This guide has walked you through the complexities and intricacies of breaking down documentation into manageable chunks, the challenges in ensuring comprehensive information retrieval, and the innovative solution that recursive retrieval presents. By implementing this with llamaindex, we demonstrated not just a theoretical concept but a practical application that can be integrated into your projects to enhance the accuracy and contextuality of responses.

The journey from understanding the limitations of traditional retrieval methods to executing a hands-on example with llamaindex highlights the transformative potential of recursive retrieval in AI-driven applications. This technology allows us to capture and utilize scattered information across multiple document chunks, ensuring that even the most complex queries are answered with the highest degree of relevance and completeness.

As we continue to push the boundaries of what’s possible with AI and machine learning, the integration of recursive retrieval and llamaindex into RAG processes represents a significant step towards more intelligent, efficient, and context-aware systems. Whether you’re a developer, a researcher, or an enthusiast, the advancements discussed in this guide open new avenues for exploration and innovation in the field of artificial intelligence.

We encourage you to dive deeper into the concepts, experiment with the code samples provided, and consider how recursive retrieval can be applied to your own projects. The possibilities are as limitless as the knowledge that fuels them. With tools like llamaindex and the power of recursive retrieval, the future of AI looks more promising and exciting than ever.

Advanced RAG: Recursive Retrieval with llamaindex

What is Recursive Retrieval?

What is llamaindex?

How to Implement Recursive Retrieval with llamaindex

Conclusion

Further Reading

Advanced RAG: Increase RAG Quality with ColBERT Reranker and llamaindex

Finding the right documents during retrieval is probably the most important aspect of your RAG pipeline. This guide…

Improving Retrieval Augmented Generation: A Step-by-Step Evaluation of RAG Pipelines

RAG pipelines are one of the corner-stones of modern AI applications. Evaluating there performance is detrimental for…

Written by Sascha Gstir