Combining Langchain and LlamaIndex to build your first Agentic RAG System

Published in

AMA Technology Blog

4 min readMay 20, 2024

Building Large Language Model (LLM) applications can be tricky, especially when we are deciding between different frameworks such as Langchain and LlamaIndex.

While LlamaIndex excels at intelligent search and data retrieval, LangChain is a more versatile and general-purpose framework for LLM applications, offering greater compatibility with a wide range of platforms.

But what if we could combine them, leveraging the unique strengths of both frameworks to address our specific needs? This blog is a starting guide on how to integrate LlamaIndex and LangChain to create a scalable and customizable Agentic RAG application.

Before we move forward with the implementation, let’s take a moment to understand the essence of an Agentic RAG.

An Agentic RAG refers to an Agent-based RAG implementation. It is an advancement over the Naive RAG approach, adding autonomous behavior and enhancing decision-making capabilities. In this setup, we create a reasoning loop by granting the LLM access to multiple RAG query engines, each serving as a tool that the LLM can invoke as required. This enables complex decision-making, broadening the system’s capacity to answer diverse queries and deliver the most appropriate responses to users.

Code Implementation:

Step 1: Define the base LLM and the embedding model

# LLM 
llm = ChatOpenAI(model_name="gpt-4-1106-preview", temperature=0, streaming=True)

# Embedding Model
embed_model = OpenAIEmbedding(
    model="text-embedding-3-small", embed_batch_size=100
)

# Set Llamaindex Configs
Settings.llm = llm 
Settings.embed_model = embed_model

Step 2: We leverage the indexing and retrieval functionalities of LlamaIndex to define individual query engines for our documents.

#Building Indexes for each of the Documents
try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft"
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
    print("Index was already created. We just loaded it from the local storage.")

except:

    index_loaded = False
    print("Index is not present. We need it to create it again.")

if not index_loaded:

    print("Creating Index..")

    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")
    
    index_loaded = True

#Creating Query engines on top of the indexes
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

print("LlamaIndex Query Engines created successfully.")

Step 3: We now use the LlamaIndex QueryEngineTool abstraction to transform these query engines into Tools, which would later be provided to the LLM.

#creating tools for each of our query engines
query_engine_tools = [

    QueryEngineTool(
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="lyft_10k",
            description=(
                "Provides information about Lyft financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    QueryEngineTool(
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="uber_10k",
            description=(
                "Provides information about Uber financials for year 2021. "
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

Step 4: We convert the LlamaIndex Tools into a format compatible with Langchain Agents.

#convert to langchain format
llamaindex_to_langchain_converted_tools = [t.to_langchain_tool() for t in query_engine_tools]

We also define an additional Langchain Tool with Web Search functionality. After that, we combine all tools.

#Another Langchain Tool
search = DuckDuckGoSearchRun()

duckduckgo_tool = Tool(
        name='DuckDuckGoSearch',
        func= search.run,
        description='Use for when you need to perform an internet search to find information that another tool can not provide.'
)   

langchain_tools = [duckduckgo_tool]


#Combine to create final list of tools
tools = llamaindex_to_langchain_converted_tools + langchain_tools

Step 5: We’ll initialize Langchain’s latest Tool Calling Agent.

The tool-calling agent, represents the most recent and versatile implementation of Agent, capable of supporting a wide variety of LLM providers such as OpenAI, Anthropic, Google Gemini, Mistral, and others.

system_context = "You are a stock market expert.\
You will answer questions about Uber and Lyft companies as in the persona of a veteran stock market investor."

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            system_context,
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

# Construct the Tools agent
agent = create_tool_calling_agent(llm, tools, prompt,)

# Create an agent executor by passing in the agent and tools
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True, return_intermediate_steps=True, handle_parsing_errors=True, max_iterations=10)

Step 6: Next, we’ll put the agent to the test with our queries.

Query 1:

question =  "What was Lyft's revenue growth in 2021?"

response = agent_executor.invoke({"input": question})
print("\nFinal Response:", response['output'])

Output:

The agent correctly invoked the lyft_10k query engine tool.

Query 2:

question =  "Is Uber profitable?"

response = agent_executor.invoke({"input": question})
print("\nFinal Response:", response['output'])

The agent correctly invoked the uber_10k query engine tool.

Query 3:

question =  "List me the names of Uber's board of directors."

response = agent_executor.invoke({"input": question})
print("\nFinal Response:", response['output'])

Since this information is out-of-scope for any of the retriever tools, the agent correctly decided to invoke the external search tool.

Points to Note:

It’s worth noting that this setup represents a fundamental Agentic RAG configuration, with potential for increased complexity.

For instance, rather than equipping a single agent with multiple tools, we can introduce multiple agents, each specialized in a subset of documents within the same domain. At the helm, a top-level agent would serve as a supervisor for these agents.

You can find the complete implementation from my notebook here.
Let me know if you liked this post and if have any questions or suggestions in the response box below.
Connect with me at: https://www.linkedin.com/in/chiragdaryani/

Combining Langchain and LlamaIndex to build your first Agentic RAG System

Code Implementation:

Points to Note:

Written by Chirag Daryani