Multi-document Agentic RAG using Llama-Index and Mistral

Published in

The AI Forum

15 min readMay 12, 2024

Introduction

Large language models (LLMs) have revolutionized the way we extract insights from vast amounts of text data. In the domain of financial analysis, LLM applications are being designed to assist analysts in answering complex questions about company performance, earnings reports, and market trends.

One such application involves the use of a retrieval augmented generation (RAG) pipeline to facilitate the extraction of information from financial statements and other sources.

Consider a scenario where a financial analyst wants to understand the key takeaways from a company’s Q2 earnings call, specifically focusing on the technological moats the company is building. This type of question goes beyond simple lookup and requires a more sophisticated approach. This is where the concept of an LLM Agent comes into play.

What is an Agent?

According to Llama-Index an “agent” is an automated reasoning and decision engine. It takes in a user input/query and can make internal decisions for executing that query in order to return the correct result. The key agent components can include, but are not limited to:

Breaking down a complex question into smaller ones
Choosing an external Tool to use + coming up with parameters for calling the Tool
Planning out a set of tasks
Storing previously completed tasks in a memory module

An LLM Agent is a system that combines various techniques such as planning, tailored focus, memory utilization, and the use of different tools to answer complex questions.

Let’s break down how an LLM Agent can be developed to answer the aforementioned question:

Planning: The LLM Agent first needs to understand the nature of the question and create a plan to extract relevant information. This involves identifying key terms like “Q2 earnings call” and “technological moats” and determining the best sources to gather this information from.
Tailored Focus: The LLM Agent then focuses its attention on the specific aspects of the question related to technological moats. This involves filtering out irrelevant information and honing in on the details that are most pertinent to the analyst’s inquiry.
Memory: The LLM Agent leverages its memory to recall relevant information from past earnings calls, company reports, and other sources. This helps provide context and background information to support its analysis.
Using Different Tools: The LLM Agent utilizes a range of tools and techniques to extract and analyze information. This may include natural language processing (NLP) algorithms, sentiment analysis, and topic modeling to gain deeper insights into the earnings call.
Breaking Down Complex Questions: Finally, the LLM Agent breaks down the complex question into simpler sub-parts, making it easier to extract relevant information and provide a coherent answer.

From Source: General Components of an Agent

Tool Calling

In standard RAG LLMs are mainly used for synthesis of information only.

On the other hand Tool Calling adds a layer of query understanding on top of a RAG Pipeline enabling the users to ask complex queries and get back more precise results. This allows the LLM to figure out how to use a vectordb instead of just consuming it’s outputs.

Tool Calling enables LLM to interact with external environments through a dynamic interface where the tool calling not only helps to choose the appropriate tool but also infer necessary arguements for execution. Thus resulting in better understanding of the ask and also generating better responses compared to standard RAG.

Agent Reasoning Loop

What if the user asks a complex question consisting of multiple steps or a vague question that needs clarification. Here agent reasoning loop comes into picture. Instead of calling it in a single shot setting , an agent is able to reason over toolsn going through multiple steps.

Agent Architecture

In LlamaIndex an agent consists of two components:

AgentRunner
AgentWorkers

The AgentRunner objects interfaces with the AgentWorkers.

AgentRunners are orchestrators which store:

State
Conversational Memory
Create Tasks
Maintain Tasks
Run Steps for each Task
Present User-Facing, High-Level User Interface

AgentWorkers take care of:

Selecting and using tools
Select the LLM to make use of the tools.

Calling the agent query allows to query the agent in a one-off manner but does not preserve the state. This is where the memory aspects comes into picture to maintain the conversation history. Here the agent maintains the chat history into a conversational memory buffer. By default the memory buffer is a flat list of items that is a rolling buffer depending on the context window size of the LLM. Therefore when the agent decides to use a tool it not only uses the current chat but also the previous conversation history in order to perform the next set of actions.

Here we will build a multi-document agent to handle multiple documents. Here we have implemented Agentic RAG on 3 documents ,the same can be extended for more documents as well.

Technology Stack Used

Llama-Index: LlamaIndex is the Data Framework for Context-Augmented LLM Apps.
Mistral API : Developers can interact with Mistral through its API, which is similar to the experience with OpenAI’s API system.

Mistral Large comes with new capabilities and strengths:

It is natively fluent in English, French, Spanish, German, and Italian, with a nuanced understanding of grammar and cultural context.
Its 32K tokens context window allows precise information recall from large documents.
Its precise instruction-following enables developers to design their moderation policies — we used it to set up the system-level moderation of le Chat.
It is natively capable of function calling.

Code Implementation

Code was implemented using google colab

Install required dependencies

%%writefile requirements.txt
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai

!pip install -r requirements.txt

####################################################################
Successfully installed Unstructured-0.13.7 accelerate-0.30.1 asgiref-3.8.1 backoff-2.2.1 bcrypt-4.1.3 chroma-hnswlib-0.7.3 chromadb-0.5.0 coloredlogs-15.0.1 dataclasses-json-0.6.6 deepdiff-7.0.1 deprecated-1.2.14 dirtyjson-1.0.8 dnspython-2.6.1 einops-0.8.0 email_validator-2.1.1 emoji-2.11.1 fastapi-0.111.0 fastapi-cli-0.0.3 fastembed-0.2.7 filetype-1.2.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 humanfriendly-10.0 importlib-metadata-7.0.0 jsonpath-python-1.0.6 kubernetes-29.0.0 langdetect-1.0.9 llama-index-0.10.36 llama-index-agent-openai-0.2.4 llama-index-cli-0.1.12 llama-index-core-0.10.36 llama-index-embeddings-fastembed-0.1.4 llama-index-embeddings-openai-0.1.9 llama-index-indices-managed-llama-cloud-0.1.6 llama-index-legacy-0.9.48 llama-index-llms-groq-0.1.3 llama-index-llms-huggingface-0.1.5 llama-index-llms-openai-0.1.18 llama-index-llms-openai-like-0.1.3 llama-index-multi-modal-llms-openai-0.1.5 llama-index-program-openai-0.1.6 llama-index-question-gen-openai-0.1.3 llama-index-readers-file-0.1.22 llama-index-readers-llama-parse-0.1.4 llama-index-vector-stores-chroma-0.1.8 llama-parse-0.4.2 llamaindex-py-client-0.1.19 loguru-0.7.2 marshmallow-3.21.2 mmh3-4.1.0 monotonic-1.6 mypy-extensions-1.0.0 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 onnx-1.16.0 onnxruntime-1.17.3 openai-1.28.1 opentelemetry-api-1.24.0 opentelemetry-exporter-otlp-proto-common-1.24.0 opentelemetry-exporter-otlp-proto-grpc-1.24.0 opentelemetry-instrumentation-0.45b0 opentelemetry-instrumentation-asgi-0.45b0 opentelemetry-instrumentation-fastapi-0.45b0 opentelemetry-proto-1.24.0 opentelemetry-sdk-1.24.0 opentelemetry-semantic-conventions-0.45b0 opentelemetry-util-http-0.45b0 ordered-set-4.1.0 orjson-3.10.3 overrides-7.7.0 posthog-3.5.0 pypdf-4.2.0 pypika-0.48.9 python-dotenv-1.0.1 python-iso639-2024.4.27 python-magic-0.4.27 python-multipart-0.0.9 rapidfuzz-3.9.0 sentence-transformers-2.7.0 shellingham-1.5.4 starlette-0.37.2 striprtf-0.0.26 text-generation-0.7.0 tiktoken-0.6.0 tokenizers-0.15.2 transformers-4.39.3 typer-0.12.3 typing-inspect-0.9.0 ujson-5.9.0 unstructured-client-0.22.0 uvicorn-0.29.0 uvloop-0.19.0 watchfiles-0.21.0 websockets-12.0

Download Documents to be processed

!mkdir data
#
! wget "https://arxiv.org/pdf/1810.04805.pdf" -O ./data/BERT_arxiv.pdf
! wget "https://arxiv.org/pdf/2005.11401" -O ./data/RAG_arxiv.pdf
! wget "https://arxiv.org/pdf/2310.11511" -O ./data/self_rag_arxiv.pdf
! wget "https://arxiv.org/pdf/2401.15884" -O ./data/crag_arxiv.pdf

Import Required Dependencies

from llama_index.core import SimpleDirectoryReader,VectorStoreIndex,SummaryIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool,QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters,FilterCondition
from typing import List,Optional

import  nest_asyncio
nest_asyncio.apply()

Read the Documents

documents = SimpleDirectoryReader(input_files = ['./data/self_rag_arxiv.pdf']).load_data()
print(len(documents))
print(f"Document Metadata: {documents[0].metadata}")

Split the documents into chunks/nodes

splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")
print(f"get the content for node 0 :{nodes[0].get_content(metadata_mode='all')}")

###########################RESPONSE ################################
Length of nodes : 43
get the content for node 0 :page_label: 1
file_name: self_rag_arxiv.pdf
file_path: data/self_rag_arxiv.pdf
file_type: application/pdf
file_size: 1405127
creation_date: 2024-05-11
last_modified_date: 2023-10-19

Preprint.
SELF-RAG: LEARNING TO RETRIEVE , GENERATE ,AND
CRITIQUE THROUGH SELF-REFLECTION
Akari Asai†, Zeqiu Wu†, Yizhong Wang†§, Avirup Sil‡, Hannaneh Hajishirzi†§
†University of Washington§Allen Institute for AI‡IBM Research AI
{akari,zeqiuwu,yizhongw,hannaneh }@cs.washington.edu ,avi@us.ibm.com
ABSTRACT
Despite their remarkable capabilities, large language models (LLMs) often produce
responses containing factual inaccuracies due to their sole reliance on the paramet-
ric knowledge they encapsulate. Retrieval-Augmented Generation (RAG), an ad
hoc approach that augments LMs with retrieval of relevant knowledge, decreases
such issues. However, indiscriminately retrieving and incorporating a fixed number
of retrieved passages, regardless of whether retrieval is necessary, or passages are
relevant, diminishes LM versatility or can lead to unhelpful response generation.
We introduce a new framework called Self-Reflective Retrieval-Augmented Gen-
eration ( SELF-RAG)that enhances an LM’s quality and factuality through retrieval
and self-reflection. Our framework trains a single arbitrary LM that adaptively
retrieves passages on-demand, and generates and reflects on retrieved passages
and its own generations using special tokens, called reflection tokens. Generating
reflection tokens makes the LM controllable during the inference phase, enabling it
to tailor its behavior to diverse task requirements. Experiments show that SELF-
RAG(7B and 13B parameters) significantly outperforms state-of-the-art LLMs
and retrieval-augmented models on a diverse set of tasks. Specifically, SELF-RAG
outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA,
reasoning and fact verification tasks, and it shows significant gains in improving
factuality and citation accuracy for long-form generations relative to these models.1
1 I NTRODUCTION
State-of-the-art LLMs continue to struggle with factual errors (Mallen et al., 2023; Min et al., 2023)
despite their increased model and data scale (Ouyang et al., 2022). Retrieval-Augmented Generation
(RAG) methods (Figure 1 left; Lewis et al. 2020; Guu et al. 2020) augment the input of LLMs
with relevant retrieved passages, reducing factual errors in knowledge-intensive tasks (Ram et al.,
2023; Asai et al., 2023a). However, these methods may hinder the versatility of LLMs or introduce
unnecessary or off-topic passages that lead to low-quality generations (Shi et al., 2023) since they
retrieve passages indiscriminately regardless of whether the factual grounding is helpful. Moreover,
the output is not guaranteed to be consistent with retrieved relevant passages (Gao et al., 2023) since
the models are not explicitly trained to leverage and follow facts from provided passages. This
work introduces Self-Reflective Retrieval-augmented Generation ( SELF-RAG)to improve an
LLM’s generation quality, including its factual accuracy without hurting its versatility, via on-demand
retrieval and self-reflection. We train an arbitrary LM in an end-to-end manner to learn to reflect on
its own generation process given a task input by generating both task output and intermittent special
tokens (i.e., reflection tokens ). Reflection tokens are categorized into retrieval andcritique tokens to
indicate the need for retrieval and its generation quality respectively (Figure 1 right). In particular,
given an input prompt and preceding generations, SELF-RAGfirst determines if augmenting the
continued generation with retrieved passages would be helpful. If so, it outputs a retrieval token that
calls a retriever model on demand (Step 1). Subsequently, SELF-RAGconcurrently processes multiple
retrieved passages, evaluating their relevance and then generating corresponding task outputs (Step
2). It then generates critique tokens to criticize its own output and choose best one (Step 3) in terms
of factuality and overall quality. This process differs from conventional RAG (Figure 1 left), which
1Our code and trained models are available at https://selfrag.github.io/ .
1arXiv:2310.11511v1  [cs.CL]  17 Oct 2023

Instantiate the vectorstore

import chromadb
db = chromadb.PersistentClient(path="./chroma_db_mistral")
chroma_collection = db.get_or_create_collection("multidocument-agent")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Instantiate the embedding model

from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings
#
embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")
#
Settings.embed_model = embed_model
#
Settings.chunk_size = 1024
#

Instantiate the LLM

from llama_index.llms.mistralai import MistralAI
os.environ["MISTRAL_API_KEY"] = userdata.get("MISTRAL_API_KEY")
llm = MistralAI(model="mistral-large-latest")

Instantiate the Vector Query tool and summary tool for specific document

LlamaIndex Data Agents process natural language input to perform actions rather than generating responses. The key to creating effective data agents lies in abstracting tools. But what exactly is meant by a tool in this context? Think of tools as API interfaces designed for agent interactions rather than human interfaces.

Core Concepts:

Tool: Essentially, a Tool includes a generic interface and fundamental metadata such as name, description, and function schema.
Tool Spec: This delves into the API specifics, presenting a comprehensive service API specification that can be translated into various Tools.

There are several types of Tools available:

FunctionTool: Converts any user-defined function into a Tool, with the ability to infer the function’s schema.
QueryEngineTool: Wraps around an existing query engine. Since our agent abstractions are derived from BaseQueryEngine, this tool can also accommodate agents .


#instantiate Vectorstore
name = "BERT_arxiv"
vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")
#
# Define Vectorstore Autoretrieval tool
def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
  '''
  perform vector search over index on
  query(str): query string needs to be embedded
  page_numbers(List[str]): list of page numbers to be retrieved,
                          leave blank if we want to perform a vector search over all pages
  '''
  page_numbers = page_numbers or []
  metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]
  #
  query_engine = vector_index.as_query_engine(similarity_top_k =2,
                                              filters = MetadataFilters.from_dicts(metadata_dict,
                                                                                    condition=FilterCondition.OR)
                                              )
  #
  response = query_engine.query(query)
  return response
#
#llamiondex FunctionTool wraps any python function we feed it
vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
                                              fn=vector_query)
# Prepare Summary Tool
summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
                                                      se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
                                                    query_engine=summary_query_engine,
                                                  description=("Use ONLY IF you want to get a holistic summary of the documents."
                                              "DO NOT USE if you have specified questions over the documents."))

Test the LLM

response = llm.predict_and_call([vector_query_tool],
                                "Summarize the content in page number 2",
                                verbose=True)
######################RESPONSE###########################
=== Calling Function ===
Calling function: vector_tool_BERT_arxiv with args: {"query": "summarize content", "page_numbers": ["2"]}
=== Function Output ===
The content discusses the use of RAG models for knowledge-intensive generation tasks, such as MS-MARCO and Jeopardy question generation, showing that the models produce more factual, specific, and diverse responses compared to a BART baseline. The models also perform well in FEVER fact verification, achieving results close to state-of-the-art pipeline models. Additionally, the models demonstrate the ability to update their knowledge as the world changes by replacing the non-parametric memory.

Helper function to generate Vectorstore Tool and Summary tool for all the documents

def get_doc_tools(file_path:str,name:str)->str:
  '''
  get vector query and sumnmary query tools from a document
  '''
  #load documents
  documents = SimpleDirectoryReader(input_files = [file_path]).load_data()
  print(f"length of nodes")
  splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
  nodes = splitter.get_nodes_from_documents(documents)
  print(f"Length of nodes : {len(nodes)}")
  #instantiate Vectorstore
  vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
  vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")
  #
  # Define Vectorstore Autoretrieval tool
  def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
    '''
    perform vector search over index on
    query(str): query string needs to be embedded
    page_numbers(List[str]): list of page numbers to be retrieved,
                            leave blank if we want to perform a vector search over all pages
    '''
    page_numbers = page_numbers or []
    metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]
    #
    query_engine = vector_index.as_query_engine(similarity_top_k =2,
                                                filters = MetadataFilters.from_dicts(metadata_dict,
                                                                                     condition=FilterCondition.OR)
                                                )
    #
    response = query_engine.query(query)
    return response
  #
  #llamiondex FunctionTool wraps any python function we feed it
  vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
                                                fn=vector_query)
  # Prepare Summary Tool
  summary_index = SummaryIndex(nodes)
  summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
                                                       se_async=True,)
  summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
                                                     query_engine=summary_query_engine,
                                                    description=("Use ONLY IF you want to get a holistic summary of the documents."
                                                "DO NOT USE if you have specified questions over the documents."))
  return vector_query_tool,summary_query_tool

Prepare a input list with specified document names

import os
root_path = "/content/data"
file_name = []
file_path = []
for files in os.listdir(root_path):
  if file.endswith(".pdf"):
    file_name.append(files.split(".")[0])
    file_path.append(os.path.join(root_path,file))
#
print(file_name)
print(file_path)

################################RESPONSE###############################
['self_rag_arxiv', 'crag_arxiv', 'RAG_arxiv', '', 'BERT_arxiv']
['/content/data/BERT_arxiv.pdf',
 '/content/data/BERT_arxiv.pdf',
 '/content/data/BERT_arxiv.pdf',
 '/content/data/BERT_arxiv.pdf',
 '/content/data/BERT_arxiv.pdf']

Note : FunctionTool expects a string that matches the pattern ‘^[a-zA-Z0–9_-]+$’ for the tool name

Generate the vectortool and summary tool for each documents

papers_to_tools_dict = {}
for name,filename in zip(file_name,file_path):
  vector_query_tool,summary_query_tool = get_doc_tools(filename,name)
  papers_to_tools_dict[name] = [vector_query_tool,summary_query_tool]

####################RESPONSE###########################
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28

Get the tools into a flat list

initial_tools = [t for f in file_name for t in papers_to_tools_dict[f]]
initial_tools

Stuffing too many tool selections into the LLM prompt will lead to the following issues:

The tools might not fit into the prompt especially if the number of our document is big as we are modeling each documents as a separate tool.
Cost and latency will spike owing to the increase in number of tokens.
The prompt outline can also get confusing resulting in the LLm not performing as instructed.

A solution here is to perform RAG on the level of tools.In order to perform this we will use ObjectIndex class of Llama-Index.

The ObjectIndex class is one that allows for the indexing of arbitrary Python objects. As such, it is quite flexible and applicable to a wide-range of use cases. As examples:

The VectorStoreIndex is a critical component of LlamaIndex, facilitating the storage and retrieval of data. It works by:

Accepting a list of Node objects and building an index from them.
Using different vector stores as the storage backend, enhancing the flexibility and scalability of applications.

from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex
#
obj_index = ObjectIndex.from_objects(initial_tools,index_cls=VectorStoreIndex)
#

Set up the ObjectIndex as retriever

obj_retriever = obj_index.as_retriever(similarity_top_k=2)
tools = obj_retriever.retrieve("compare and contrast the papers self rag and corrective rag")
#
print(tools[0].metadata)
print(tools[1].metadata)

###################################RESPONSE###########################
ToolMetadata(description='Use ONLY IF you want to get a holistic summary of the documents.DO NOT USE if you have specified questions over the documents.', name='summary_tool_self_rag_arxiv', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

ToolMetadata(description='vector_tool_self_rag_arxiv(query: str, page_numbers: Optional[List[str]] = None) -> str\n\n    perform vector search over index on\n    query(str): query string needs to be embedded\n    page_numbers(List[str]): list of page numbers to be retrieved,\n                            leave blank if we want to perform a vector search over all pages\n    ', name='vector_tool_self_rag_arxiv', fn_schema=<class 'pydantic.v1.main.vector_tool_self_rag_arxiv'>, return_direct=False)

Setup the RAG Agent

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
#
agent_worker = FunctionCallingAgentWorker.from_tools(tool_retriever=obj_retriever,
                                                     llm=llm,
                                                     system_prompt="""You are an agent designed to answer queries over a set of given papers.
                                                     Please always use the tools provided to answer a question.Do not rely on prior knowledge.""",
                                                     verbose=True)
agent = AgentRunner(agent_worker)

Ask Query 1

#
response = agent.query("Compare and contrast self rag and crag.")
print(str(response))

##############################RESPONSE###################################
Added user message to memory: Compare and contrast self rag and crag.
=== LLM Response ===
Sure, I'd be happy to help you understand the differences between Self RAG and CRAG, based on the functions provided to me.

Self RAG (Retrieval-Augmented Generation) is a method where the model generates a holistic summary of the documents provided as input. It's important to note that this method should only be used if you want a general summary of the documents, and not if you have specific questions over the documents.

On the other hand, CRAG (Contrastive Retrieval-Augmented Generation) is also a method for generating a holistic summary of the documents. The key difference between CRAG and Self RAG is not explicitly clear from the functions provided. However, the name suggests that CRAG might use a contrastive approach in its retrieval process, which could potentially lead to a summary that highlights the differences and similarities between the documents more effectively.

Again, it's crucial to remember that both of these methods should only be used for a holistic summary, and not for answering specific questions over the documents.

Ask Query 2

response = agent.query("Summarize the paper corrective RAG.")
print(str(response))
###############################RESPONSE#######################
Added user message to memory: Summarize the paper corrective RAG.
=== Calling Function ===
Calling function: summary_tool_RAG_arxiv with args: {"input": "corrective RAG"}
=== Function Output ===
The corrective RAG approach is a method used to address issues or errors in a system by categorizing them into three levels: Red, Amber, and Green. Red signifies critical problems that need immediate attention, Amber indicates issues that require monitoring or action in the near future, and Green represents no significant concerns. This approach helps prioritize and manage corrective actions effectively based on the severity of the identified issues.
=== LLM Response ===
The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.
assistant: The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.

Conclusion

Unlike the standard RAG pipeline — suitable for simple queries across a few documents — this intelligent approach adapts based on initial findings to enhance further data retrieval. Here we have developed an autonomous research agent, enhancing our ability to engage with and analyze our data comprehensively.

References:

Controllable Agents for RAG - LlamaIndex

A Guide to Building a Full-Stack LlamaIndex Web App with Delphic

docs.llamaindex.ai