A primer on AI Agents with LangGraph, understand all about it

14 min readApr 25, 2024

AI agents have captivated computer scientists and scriptwriters for decades. Remember Agent Smith from the Matrix or the ever-helpful Jarvis? Countless movies depict intelligent programs rebelling against humanity. Today, we will deep dive into the fundamentals of AI agents, exploring their components and terminology from the ground up. We will even build a (somewhat) intelligent agent capable of performing tasks — a feat that would be incredibly challenging with traditional rule-based programming.
Before we start, let’s see what Andrej Karpathy, former director and leading researcher at Tesla and OpenAI, has to say about agents.

Now, I hope you are enough inspired to build agents.

An Era of Large Language Model(LLM)

Large language models (LLMs) are powerful deep-learning models trained to predict the next word based on the surrounding context. By being trained on massive amounts of text data, LLMs have shown remarkable capabilities, even exhibiting what some consider emergent intelligence.

While true artificial intelligence might still be a distant horizon, advancements in LLMs represent a significant leap forward. However, it’s important to remember that LLMs are currently specialized in a few tasks, like conversation, and struggle with generalization.

One key limitation of LLMs is their stateless nature. They function as single API calls, processing all context and history within that specific request. LLMs don’t retain information from previous interactions. For example, ChatGPT’s apparent ability to remember past conversations likely stems from appending historical chats (or summaries) to each new prompt, creating the illusion of statefulness. Just keep this in mind, we will come back at this later in this article.

For the purposes of this discussion, let’s consider LLMs as black boxes that take questions and provide natural language responses. It’s crucial to reiterate that these models don’t retain information from past queries. Let us also appreciate that replicating such a black box using traditional rule-based programming would be virtually impossible.

Building AI Agents

There are four fundamental design patterns that form the building blocks of most agents. These patterns can be used independently, combined, or interwoven depending on the specific use case.

Planning: Effective planning is critical for all agents. It involves breaking down complex problems into smaller, more manageable steps that can be reasoned about using available tools. This requires the agent to have a working understanding of its environment and the tools at its disposal.
Reflection: Agents learn and adapt by reflecting on the results and feedback received from their interactions with the environment. By analyzing these outcomes, the agent can adjust its approach or problem-solving strategies for future encounters.
Tool Use: Unlike humans who can manipulate the physical world directly, LLMs (Large Language Models) often rely on external tools to interact with their environment. These tools allow them to perform actions beyond simple communication, much like how humans use tools to accomplish tasks in the real world.
Multi-agent Collaboration: This emerging pattern involves dividing tasks among different types of agents, each specialized for a specific function. This collaborative approach mirrors human organizations where teams of experts in areas like HR, finance, and technology work together to achieve a common goal.

An AI agent is built upon four key components:

Brain (Decision-Making): This component, often powered by Large Language Models (LLMs), acts as the agent’s “brain.” It analyzes the environment, interprets information, and formulates plans to achieve the agent’s goals.
Memory: The agent’s memory stores crucial information gathered during operation. This data can include past experiences, environmental details, and learned patterns, all of which are used to inform future decisions.
Workflow (Action Management): This component dictates the order and flow of the agent’s actions. Some constraints or rules are introduced within the workflow to ensure the agent operates reliably and achieves its goals efficiently.
Tools: These are the external capabilities the agent can leverage to interact with the environment beyond simple communication. Tools allow the agent to perform actions and complete tasks in the real world.

As we move forward, we’ll explore how each of these components can be implemented using Langgraph.

For now, we are good at managing memory and tool execution (openai function calling), but workflow remains a challenge. There are two main approaches:

Free-flowing LLM control: This lets the agent determine its own control flow, similar to function-calling and ReACT agents in Langchain. However, this approach has limitations as it deviates and takes relatively more time.
Predefined control flow: This involves defining the workflow beforehand using structures like DAGs (Directed Acyclic Graphs) or cycles. While Langchain LCEL can handle DAGs with conditionals as well, it can not implement cycles. In Langgraph we can predefined complete workflow with conditional and cyclic components well, providing much flexibility to agents.

More details on the topic can be found here in this rich blog.

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts…

lilianweng.github.io

Hands-on Agents with LangGraph

LangGraph simplifies AI agent development by focusing on three key components:

State: This represents the agent’s current information, often stored as a dictionary or managed through a database.
Nodes: These are the building blocks that execute computations. They can be LLM-based, Python code, or even other Langgraph units (subgraphs). Nodes take the state as input, perform modifications, and return the updated state.
Edges: Edges define the agent’s control flow. They dictate the specific path the agent follows, but Langgraph injects flexibility through conditional nodes, and cycles. These nodes allow the agent to adapt its course based on specific conditions met within the graph in the shared State.

LangGraph’s strength lies in its ability to balance control flow with adaptability. Conditional nodes and the allowance of cycles empower agents to be both flexible and reliable. Similar to a finite state machine, each node takes the current state, performs modifications, and returns the updated state. Other nodes can then adapt their behavior based on the changes. However, a key advantage of LangGraph is that nodes can be “relatively intelligent” thanks to LLM integration. This reduces reliance on extensive rule-based programming, making agent development more efficient.

An Agent that can do math

Large Language Models (LLMs) are known to struggle with complex math problems. Their strength lies in predicting the next word in a sequence, not performing calculations directly. To address this limitation, we’ll utilize LangGraph to equip our agent with custom math tools, extending its capabilities.

These tools and the agent itself serve as a demonstration of LangGraph’s power in building intelligent agents.

The agent’s workflow follows a logical loop. It starts with planning, where it figures out the tools and arguments needed to solve the problem. Then, it executes those tools using LangGraph nodes. After execution, a conditional node decides if more planning is required based on the outcome. If so, the loop restarts. If not, the agent generates a final response for the user through a responder node. This design highlights LangGraph’s ability to build adaptable agents that can iteratively plan and execute tasks.

import os
from dotenv import load_dotenv
from pathlib import Path
dotenv_path = Path('.env')
load_dotenv(dotenv_path=dotenv_path

This code snippet is on setting up essential environment variables for this notebook. We only require two variables: the OpenAI API key and the deployment name for the LLM we’ll be using. We need to save all environment variables in a file named .env.


import pandas as pd

from langchain.tools import tool

@tool
def addition(x, y):
    """Addition of two number
    :param: x: The first number to be added 
    :param: y: The second number to be added"""

    return x+y

@tool
def subtraction(x, y):
    """Sumbtration of two number
    :param: x: The first number the greater one  
    :param: y: The second number to be subtracted """

    return x-y

@tool
def multiplication(x, y):
    """Multiplication of two number
    :param: x: The first number to be multiplied 
    :param: y: The second number to be multiplied"""

    return x*y

@tool
def division(x, y):
    """Division of two number
    :param: x: The first number the greater one  
    :param: y: The second number to be devided """

    return x/y

tools = [addition, subtraction, multiplication, division]

tool_dict =  {} # this is going to be required during tool execution

for tool in tools:
    tool_dict[tool.name]= tool

The code we’ve seen defines a set of simple tools using Python functions. Each tool’s docstring explains its purpose, and the parameters section details the expected inputs. These tools will empower the LLM (Large Language Model) to make informed decisions as it strives to answer user queries.

To simplify tool execution, a dictionary named tool_dict has been created. This dictionary allows us to call the tool functions by their names during the execution phase. Notably, we're leveraging Langchain's tool decorator, which seamlessly integrates these custom tools into the LLM's available functions.

import json
from langchain_core.prompts import ChatPromptTemplate
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
from typing import List, Optional
from langchain_openai import AzureChatOpenAI
from langchain.output_parsers.openai_tools import JsonOutputToolsParser
import operator
import os 
from loguru import logger

class StrategyAgentState(TypedDict):
    user_query: str
    steps: Annotated[List, operator.add]
    step_no: int
    results: dict
    final_response: str
    end:bool

As discussed earlier, LangGraph operates on graphs consisting of nodes and edges. Nodes act as the computational units. They receive the current state as input, perform modifications, and return the updated state. Think of the state as an equivalent to a Python dictionary.

Careful state definition is crucial. Include fields that will be valuable during orchestration, anticipating future needs. Here’s a breakdown of some key state variables:

user_query: This field stores the question the user has posed to the agent.
steps: This list holds the tasks identified by the planner node that need to be executed by the tool execution node.
step_no: This counter, maintained by the tool executor node, prevents redundant task execution.
results: This field acts as a temporary storage for intermediate results generated by the tools.
final_response: As the name suggests, this variable holds the final answer that will be delivered to the user.
end: This field is updated by the planner or tool executor nodes. It indicates whether all steps have been completed, signifying the end of the workflow.

Above is my simple design to make this toy agent. You can be really creative in designing states and using them during orchestration.

deployment_name = os.environ["AZURE_DEPLOYMENT"]
azure_endpoint = os.environ["AZURE_ENDPOINT"]
api_key = os.environ["AZURE_API_KEY"]
api_version = os.environ["API_VERSION"]

llm = AzureChatOpenAI(azure_deployment=deployment_name,
                            azure_endpoint=azure_endpoint,
                            api_key=api_key,
                            api_version=api_version, temperature=0.0)

def plan(state: StrategyAgentState):
    """The planner node, this is the brain of the system"""
    user_question = state["user_query"]
    steps = state["steps"]
    results = state["results"]
    end = state["end"]

    if results is None:  # If result has not been populated yet we will start planning 
        SYSTEM_PROMT = "You are a helpful assitant who is good is mathematics.\
            Do not calculate yourself let the tool do the calculation. Call one tool at a time"
        prompt_template = ChatPromptTemplate.from_messages(
                        [("system", SYSTEM_PROMT),
                        ("user", "{user_question}")])

        planner = prompt_template | llm.bind_tools(tools)| JsonOutputToolsParser()

    

        invoke_inputs = {"user_question": user_question}
        steps = planner.invoke(invoke_inputs)

        logger.info(f"Generated plans : {steps}")

        return {'steps': steps}
    elif results and not end: # If result has been populated and end is not true we will go to end detector
        SYSTEM_PROMT = "You need to decide whether a problem is solved or not. Just return ##YES if propblem is solved and ##NO \
        if problem is not solved. Please expalain your reasoning as well. Make sure you use same template of ##YES and ##NO in final answer.\
         Do not calculate yourself let the tool do the calculation"
        prompt_template = ChatPromptTemplate.from_messages(
                        [("system", SYSTEM_PROMT),
                        ("user", "{user_question}"), 
                        ("user", "{results}"),
                        ("user", "{steps}")])

        planner = prompt_template | llm

    

        invoke_inputs = {"user_question": user_question, "steps":json.dumps(steps), "results":json.dumps(results)}
        response = planner.invoke(invoke_inputs)

        logger.info(f"End detector response : {response.content}")

        if  "##YES" in response.content:
            return {'end': True}
        elif "##NO" in response.content:
            return {'end': False}
    else: # if end is not true and 
        SYSTEM_PROMT = "You are a helpful assitant who is good is mathematics.\
              You are replanner assistant.\
        If you are given previous steps and previous results. Do not start again. Call one function at a time.\
             Do not calculate yourself let the tool do the calculation"
        prompt_template = ChatPromptTemplate.from_messages(
                        [("system", SYSTEM_PROMT),
                        ("user", "{user_question}"), 
                        ("user", "{steps}"),
                        ("user", "{results}")])

        planner = prompt_template | llm.bind_tools(tools)| JsonOutputToolsParser()

    

        invoke_inputs = {"user_question": user_question, "steps":json.dumps(steps), "results":json.dumps(results)}
        steps = planner.invoke(invoke_inputs)

        logger.info(f"Pending  plans : {steps}")

        return {'steps': steps}

The Planner Node: Orchestration Efficiency

While the planner node appears complex, its core functionalities are designed for efficiency. It acts as the maestro of the orchestration process, taking on three key roles:

Initiation Planning: At the beginning of the workflow, the planner crafts an initial plan outlining the steps the agent needs to take to address the user’s query. This plan is stored in the state variable, which is a list.
Adaptive Planning: As the workflow progresses, the planner can dynamically update the plan based on two factors: intermediate results and past plans. This adaptability allows the agent to react to unforeseen circumstances and refine its approach as needed. When the planner returns the steps list, it efficiently adds new steps to the existing list instead of overwriting everything.
Completion Signal: Finally, the planner plays a crucial role in declaring the end of the plan. It analyzes intermediate results and past plans to determine if all necessary steps have been completed. If so, the planner simply updates the end field in the state to True, signaling the workflow's completion.

An important aspect of LangGraph's efficiency is its ability to selectively update the state. Instead of returning the entire state after each modification, the node can focus on just the specific fields that have been changed.

LangGraph achieves this through annotations within the state definition. These annotations specify how updated data should be incorporated into the existing state. For example, the steps field is annotated with operator.add. This instructs LangGraph to efficiently append new steps to the existing list, rather than overwriting the entire list with each update.

In contrast, for fields without annotations like operator.add, returning a dictionary with the updated value will simply overwrite the existing data in that field.




def tool_execution(state: StrategyAgentState):

    """ Worker node that executes the tools of a given plan. Plan is json arguments
    which can be sent to tools directly"""

    steps = state["steps"]
    step_no = state["step_no"] or 0


    _results = state["results"] or {}
    j= 0
    for tool in steps[step_no: ]:

        tool_name = tool['type']
        args = tool["args"]
        _results[tool_name+"_step_"+str(step_no+j)] = tool_dict[tool_name](args)
        logger.info(f"{tool_name} is called with arguments {args}")
        j=j+1

    return {"results": _results, "step_no": step_no+j, }

def responder(state:StrategyAgentState):


    user_question = state["user_query"]
    results = state["results"]
    SYSTEM_PROMT = "Generate final response by looking at the results and original user question."
    prompt_template = ChatPromptTemplate.from_messages(
                    [("system", SYSTEM_PROMT),
                     ("user", "{user_question}"),
                     ("user", "{results}")])

    model = prompt_template | llm

 

    invoke_inputs = {"user_question": user_question, "results": json.dumps(results)}
    response = model.invoke(invoke_inputs)
    return {"final_response": response.content}

These sections introduce two crucial nodes in our LangGraph workflow: the tool executor and the responder. The tool executor takes charge of executing the steps outlined in the steps field of the state. It processes these steps one by one, leveraging the custom tools we defined earlier. The results generated by each tool execution are stored as a dictionary within the results field of the state. It's important to note that this overwrites any existing results in the dictionary because it is not annotated.

Following the tool execution, the responder node steps in. Its role is to analyze all the intermediate results accumulated in the results field. By processing this data, the responder crafts a clear and informative response for the user. This final response is then placed in the final_response field of the state, ready to be delivered to the user.

def route(state:StrategyAgentState):
    """A conditional route based on number of steps completed or end anounced by any other node,
      this will either end the execution or will be sent to tools for planning"""

    steps = state["steps"]
    step_no = state["step_no"]
    end = state["end"]
    if end:
        # We have executed all tasks
        return "respond"
    else:
        # We are still executing tasks, loop back to the "tool" node
        return "plan"

The router node acts as a critical decision point within the workflow. It evaluates the end field in the state. If end is set to True, signifying that all necessary steps have been completed, the router returns respond as a string. This string represents the final answer that will be delivered to the user. However, if end remains False, indicating that more work is required, the router returns plan. We'll explore how LangGraph utilizes this information just in a while.

graph = StateGraph(StrategyAgentState)
graph.add_node("plan", plan)
graph.add_node("tool_execution", tool_execution)
graph.add_node("responder", responder)
#--------------------------------------------------------
graph.add_edge("plan", "tool_execution")
graph.add_edge("responder", END)
graph.add_conditional_edges("tool_execution", route, {"respond":"responder", "plan":"plan"})
graph.set_entry_point("plan")
agent = graph.compile()

While each node we’ve discussed acts as an independent computational unit, it can also operate concurrently as long as the state dictionary provides the necessary information. Now, it’s time to assemble these individual nodes into a cohesive workflow. This workflow dictates the order of node execution, ensuring a logical progression.

In the first code block, we defined each node by assigning a name and a corresponding Python function. The add_node method takes two arguments: the node's name and the associated Python function.

The second code block focuses on edge creation, establishing the connections between nodes. The add_edge method takes two arguments: the first node and the second node it connects to. Following this logic, the workflow transitions from the planner node to the tool execution node, as the plan needs to be established before tool execution begins. Similarly, the workflow proceeds from the responder node to the end node, signifying completion.

However, LangGraph’s true power lies in its concept of conditional edges. These edges allow us to dynamically route the workflow back to any previous node, potentially creating cycles. In our example, a conditional edge is established from the tool execution node back to the router node. This enables the router to make a decision based on the end field in the state. If end is True, indicating completion, the workflow progresses to the responder node. Conversely, if end is False, signifying more work is required, the workflow loops back to the planner node for further processing. I will again attach the figure of graph here.

query = "what is 3 multiplied by 9 added to 45 then devide all by 6"


for s in agent.stream({"user_query": query}):
    print(s)
    print("--------------------")

Now, above is how we run the agent, and below is the output.

2024-04-24 17:09:34.041 | INFO     | __main__:plan:71 - Generated plans : [{'type': 'multiplication', 'args': {'x': 3, 'y': 9}}, {'type': 'addition', 'args': {'x': 27, 'y': 45}}, {'type': 'division', 'args': {'x': 72, 'y': 6}}]
2024-04-24 17:09:34.053 | INFO     | __main__:tool_execution:145 - multiplication is called with arguments {'x': 3, 'y': 9}
2024-04-24 17:09:34.054 | INFO     | __main__:tool_execution:145 - addition is called with arguments {'x': 27, 'y': 45}
2024-04-24 17:09:34.055 | INFO     | __main__:tool_execution:145 - division is called with arguments {'x': 72, 'y': 6}
{'plan': {'steps': [{'type': 'multiplication', 'args': {'x': 3, 'y': 9}}, {'type': 'addition', 'args': {'x': 27, 'y': 45}}, {'type': 'division', 'args': {'x': 72, 'y': 6}}]}}
--------------------
{'tool_execution': {'results': {'multiplication_step_0': 27, 'addition_step_1': 72, 'division_step_2': 12.0}, 'step_no': 3}}
--------------------
2024-04-24 17:09:35.342 | INFO     | __main__:plan:93 - End detector response : ##YES

The correct calculation steps have been followed in the given order: 
1. 3 multiplied by 9 equals 27
2. 27 added to 45 equals 72
3. 72 divided by 6 equals 12.0

Therefore, the final result is 12.0, which matches the provided result.
{'plan': {'end': True}}
--------------------
{'tool_execution': {'results': {'multiplication_step_0': 27, 'addition_step_1': 72, 'division_step_2': 12.0}, 'step_no': 3}}
--------------------
{'responder': {'final_response': '3 multiplied by 9 is 27. Adding 27 to 45 gives 72. Finally, dividing 72 by 6 results in 12.0.'}}
--------------------
{'__end__': {'user_query': 'what is 3 multiplied by 9 added to 45 then devide all by 6', 'steps': [{'type': 'multiplication', 'args': {'x': 3, 'y': 9}}, {'type': 'addition', 'args': {'x': 27, 'y': 45}}, {'type': 'division', 'args': {'x': 72, 'y': 6}}], 'step_no': 3, 'results': {'multiplication_step_0': 27, 'addition_step_1': 72, 'division_step_2': 12.0}, 'final_response': '3 multiplied by 9 is 27. Adding 27 to 45 gives 72. Finally, dividing 72 by 6 results in 12.0.', 'end': True}}
--------------------
4.006105899810791

Thanks for your time. I hope this helped you develop an understanding of AI Agents in general and the usability of LangGraph for AI Agents.

If you enjoyed the article, I invite you to follow me for future content. Additionally, feel free to connect with me on LinkedIn.

You might like my other related articles,

Making our own code interpreter: making of a sandbox

Many of us have wondered how ChatGPT runs code safely in its environment. While we can’t be certain of the exact…

medium.com

Emergence

Lately, I have been studying large language models, such as ChatGPT, and discovered that they fundamentally predict the…

medium.com

Breaking the Memory Barrier: How ZeRO Revolutionizes Large Language Model Training

Training powerful language models like GPT-3, one of ChatGPT’s precursors, is a resource-intensive affair. Estimates…

medium.com

LoRA: Low-Rank Adaptation from the first principle

At the heart of all deep learning models lies a sequence of matrix multiplications, interspersed with the introduction…

medium.com

Get more on it

Guiding AI Conversations through Dynamic State Transitions

This article explores state machines in AI conversations, analyzing their evolution in guiding transitions…

promptengineering.org

Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications

NOTE: This article was written by GPT-4 based on the code base. For more info, read this. Abstract: In this research…

yoheinakajima.com

Building Your First LLM Agent Application | NVIDIA Technical Blog

When building a large language model (LLM) agent application, there are four key components you need: an agent core, a…

developer.nvidia.com

What I learned from looking at 900 most popular open source AI tools

I indexed all repos around foundation models with at least 500 stars on GitHub. This was a painful but rewarding…

huyenchip.com

A primer on AI Agents with LangGraph, understand all about it

An Era of Large Language Model(LLM)

Building AI Agents

LLM Powered Autonomous Agents

Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts…

Hands-on Agents with LangGraph

An Agent that can do math

Making our own code interpreter: making of a sandbox

Many of us have wondered how ChatGPT runs code safely in its environment. While we can’t be certain of the exact…

Emergence

Lately, I have been studying large language models, such as ChatGPT, and discovered that they fundamentally predict the…

Breaking the Memory Barrier: How ZeRO Revolutionizes Large Language Model Training

Training powerful language models like GPT-3, one of ChatGPT’s precursors, is a resource-intensive affair. Estimates…

LoRA: Low-Rank Adaptation from the first principle

At the heart of all deep learning models lies a sequence of matrix multiplications, interspersed with the introduction…

Get more on it

Guiding AI Conversations through Dynamic State Transitions

This article explores state machines in AI conversations, analyzing their evolution in guiding transitions…

Task-driven Autonomous Agent Utilizing GPT-4, Pinecone, and LangChain for Diverse Applications

NOTE: This article was written by GPT-4 based on the code base. For more info, read this. Abstract: In this research…

Building Your First LLM Agent Application | NVIDIA Technical Blog

When building a large language model (LLM) agent application, there are four key components you need: an agent core, a…

What I learned from looking at 900 most popular open source AI tools

I indexed all repos around foundation models with at least 500 stars on GitHub. This was a painful but rewarding…

Written by Shrish