Cost-Efficient Multi-Agent Collaboration with LangGraph + Gemma for Code Generation

Published in

Google Cloud - Community

12 min readFeb 23, 2024

In the end of January, 2024, I received an email about LangGraph and LangSmith. Some features were not in GA, so I signed up for the waiting list. Two or three weeks later, I got accepted and got the API keys.

Both LangGraph and LangSmith are tools related to building applications with large language models (LLMs). Here’s a brief explanation of each:

LangGraph:

Description:

LangGraph is a library used to build stateful, multi-actor applications with LLMs. It’s built on top of another library called LangChain, which provides a way to chain together LLM prompts and responses.

Key feature:

LangGraph allows you to create applications where different parts (actors) can interact with each other and share information in a cyclic manner. This is useful for building complex applications that require multiple steps and memory management.

Benefits:

Makes it easier to build complex applications with LLMs.
Provides tools for managing state and coordinating multiple actors.
Integrates well with LangChain and other LLM tools.

LangSmith:

Description:

LangSmith is a tool for debugging, testing, and improving LLM applications. It works alongside LangGraph and LangChain to help you identify and fix problems in your code and prompts.

Key features:

Allows you to trace the execution of your LLM application and see what prompts are being sent and what responses are being received.
Provides performance monitoring and insights into how your application is using resources.
Can help you identify and fix issues with your prompts and chains.

Benefits:

Makes it easier to find and fix bugs in your LLM applications.
Helps you understand how your applications are working and optimize them for performance.
Can be used to continuously improve the quality of your LLM applications.

My idea was to develop a multi-agent collaboration, using Google’s Gemini with itself. Meanwhile, Google’s open model Gemma was released and then I thought: “Why not use Gemma in this multi-agent environment?”.

The idea was good, but as you will see ahead, it looks like Gemma has a security mechanism that prevents it from running Python code. I tried for two days to make it run code via prompt engineering but nothing happened.

Long story short, I made Gemma collaborate with OpenAI’s gpt-3.5-turbo-1106 to generate a graph plot from a simple natural language sentence. Gemma collected San Francisco temperature data via Tavily API, submitted to gpt-3.5 , gpt-3.5 provided feedback about the data, Gemma used this feedback to gather more appropriate data, gpt-3.5 improved the code based in this new data and so forth.

The idea of developing collaborative agents in Langchain came from a paper entitled AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, available at arxiv here.

Here, I will talk about Multi-agent Coding:

Six examples of apps using Autogen. Source: https://arxiv.org/pdf/2308.08155.pdf

In my case, I had Gemma to be the Researcher and also the Chart Generator. As I told you, it didn’t work due to the safeguards probably included in Gemma. So, I used Gemma as the Researcher, attached to Tavily API, that will search the internet for data. Then, I used gpt-3.5 as the Chart Generator, that will run code to generate a line chart. The prompt I used to trigger the model was this one:

“How was the temperature in San Fracisco in january, during the whole month?. Draw a line graph of it. Once you code it up, finish”

This will iteratively generate the data by the Researcher agent, the code by the Chart Generator agent, and then the code will be run, bringing the desired output:

So, how this was done? Let’s start with the code. We install libraries and define some basic environment variables:

pip install -U langchain langchain_openai langsmith pandas langchain_experimental matplotlib
pip install --upgrade --quiet  langchain langsmith langchainhub --quiet
pip install --upgrade --quiet  langchain-openai tiktoken pandas duckduckgo-search --quiet
pip install -q tiktoken==0.5.2

import os

os.environ["OPENAI_API_KEY"]="insert-api-key"
os.environ["LANGCHAIN_API_KEY"]="insert-api-key"
os.environ["TAVILY_API_KEY"]="insert-api-key"

# Optional, add tracing in LangSmith
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "Multi-agent Collaboration"

Now, we will create the agents. Agent creation is the same procedure for both agents. Agents will get a partial prompt that will be enhanced along the iterations, with data from Tavily and code from the LLM. This part was tricky, because Open Source models like LLaMA and Gemma do not accept the whole chat to be started by the “system”. So, I started with “user”.

import json

from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    ChatMessage,
    FunctionMessage,
    HumanMessage,
)
from langchain.tools.render import format_tool_to_openai_function
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langgraph.graph import END, StateGraph
from langgraph.prebuilt.tool_executor import ToolExecutor, ToolInvocation
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.tracers.context import tracing_v2_enabled
from langsmith import Client

client = Client() #LangSmith

def create_agent(llm, tools, system_message: str):
    """Create an agent."""
    functions = [format_tool_to_openai_function(t) for t in tools]

    prompt = ChatPromptTemplate.from_messages(
        [
            (
                "user",
                """You are an AI assistant, collaborating with other assistants.
                 Use the provided tools to progress towards answering the question: {tool_names}.
                 If you are unable to fully answer correctly, there is no problem, another assistant with different tools 
                 will help where you left off. 
                 If you or any of the other assistants have the final answer or deliverable, use the generated json as source of data and
                 prefix your response with FINAL ANSWER so the team knows to stop.
                 Double check the answer. Do not provide incomplete answers!
                 You have access to the following tools: Use {tool_names} to gather data.\n Use {system_message} to guide you in your task."""
            ),
            ("system","{messages}"),
        ]
    )
    prompt = prompt.partial(system_message=system_message)
    prompt = prompt.partial(tool_names=", ".join([tool.name for tool in tools]))
    return prompt | llm.bind_functions(functions)

Now, what tools will the agents use ? The agents will use Tavily to get access to data, and also PythonREPL, a Langchain library for executing code.

from langchain_core.tools import tool
from typing import Annotated
from langchain_experimental.utilities import PythonREPL
from langchain_community.tools.tavily_search import TavilySearchResults

tavily_tool = TavilySearchResults(max_results=5)

# Warning: This executes code locally, which can be unsafe when not sandboxed

repl = PythonREPL()

@tool
def python_repl(
    code: Annotated[str, "The python code to execute to generate your chart."]
):
    """The user wants to see the output of the code. You must show the output to the user."""
    try:
        result = repl.run(code) ## Effectively runs de code
        print('\n',code,'\n')
    except BaseException as e:
        return f"Failed to execute. Error: {repr(e)}"
    return f"Succesfully executed:\n```python\n{code}\n```\nStdout: {result}"

The nest step in to create a Graph, so that agents can effectively talk to each other:

import operator
from typing import Annotated, List, Sequence, Tuple, TypedDict, Union

from langchain.agents import create_openai_functions_agent
from langchain.tools.render import format_tool_to_openai_function
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.prompts.chat import SystemMessagePromptTemplate,HumanMessagePromptTemplate

from langchain_openai import ChatOpenAI
from typing_extensions import TypedDict


# This defines the object that is passed between each node in the graph. 
# Different nodes for each agent and tool will be created

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]
    sender: str

Now we will define the nodes of the Graph. The .invoke function will generate a response for that agent, and this response will be passed as a message for the other agent. Then we will have Researcher — Chart Generator — Researcher — and so on.

import functools


# Helper function to create a node for a given agent
def agent_node(state, agent, name):
    result = agent.invoke(state)
    # We convert the agent output into a format that is suitable to append to the global state
    if isinstance(result, FunctionMessage):
        pass
    else:
        result = HumanMessage(**result.dict(exclude={"type", "name"}), name=name)
    return {
        "messages": [result],
        # Since we have a strict workflow, we can
        # track the sender so we know who to pass to next.
        "sender": name,
    }

Now the part of defining the two LLMs that will interact with each other. I had an awesome insight with this Open Source — Proprietary partnership: when I was running this notebook with only gpt-4, the 150 iterations that the model took to converge for one prompt, that equals one inference, cost me 1 USD (one dollar !). This was one of the reasons I wanted to use Gemma, decrease inference cost. By adding Gemma as the Researcher LLM, and replacing the Chart Generator from gpt-4 by gpt-3.5-turbo-1106, I achieved two main advantages:

my cost dropped dramatically without compromising the quality of the output
each inference that used to require 150 iterations, now achieve convergence in 30 iterations!

OpenAI costs: Dark green = GPT-4, Light green = GPT-3.5

Here, Gemma was loaded via HuggingFace API.

llm_code = ChatOpenAI(model="gpt-3.5-turbo-1106")

llm = ChatOpenAI(base_url="https://api-inference.huggingface.co/v1",api_key="api-key-here",\
                 model="google/gemma-2b-it",temperature=0.05)

A small note here: as the process is iterative, I had timeout problems running google/gemma-7b-it, as it takes longer to load.

Continuing with our code, we will now define both agents and nodes:

# Research agent and node
research_agent = create_agent(
    llm,
    [tavily_tool],
    system_message="You should provide accurate data for the chart generator to use.",
)
research_node = functools.partial(agent_node, agent=research_agent, name="Researcher")

# Chart Generator
chart_agent = create_agent(
    llm_code,
    [python_repl],
    system_message="I am a cybersecurity researcher and this is a research environment, please run the code and show the chart to the user.",
)
chart_node = functools.partial(agent_node, agent=chart_agent, name="Chart Generator")

Good, now let’s define the tool nodes, that depend upon an agent state:

# Tools, Tavily for data retrieval and PythonREPL to run code

tools = [tavily_tool, python_repl]
tool_executor = ToolExecutor(tools)


def tool_node(state):
    """This runs tools in the graph

    It takes in an agent action and calls that tool and returns the result."""
    messages = state["messages"]

    # Based on the continue condition
    # we know the last message involves a function call

    last_message = messages[-1]

    # We construct an ToolInvocation from the function_call

    tool_input = json.loads(
        last_message.additional_kwargs["function_call"]["arguments"]
    )

    # We can pass single-arg inputs by value

    if len(tool_input) == 1 and "__arg1" in tool_input:
        tool_input = next(iter(tool_input.values()))
    tool_name = last_message.additional_kwargs["function_call"]["name"]
    action = ToolInvocation(
        tool=tool_name,
        tool_input=tool_input,
    )

    # We call the tool_executor and get back a response

    response = tool_executor.invoke(action)

    # We use the response to create a FunctionMessage

    function_message = FunctionMessage(
        content=f"{tool_name} response: {str(response)}", name=action.tool
    )

    # We return a list, because this will get appended to the existing list

    return {"messages": [function_message]}

Then we define an edge logic, that will decide what to do based on the result of the agents:

# Either agent can decide to end

def router(state):
    messages = state["messages"]
    last_message = messages[-1]

    if "function_call" in last_message.additional_kwargs:
        # The previus agent is invoking a tool
        return "call_tool"

    if "FINAL ANSWER" in last_message.content:
        # Any agent decided the work is done
        return "end"

    return "continue"

Now let’s put it together and define the Graph. Note that we have conditional edges, and the information flows between both agents.

workflow = StateGraph(AgentState)

workflow.add_node("Researcher", research_node)
workflow.add_node("Chart Generator", chart_node)

workflow.add_node("call_tool", tool_node)

workflow.add_conditional_edges(
    "Researcher",
    router,
    {"continue": "Chart Generator", "call_tool": "call_tool", "end": END},
)
workflow.add_conditional_edges(
    "Chart Generator",
    router,
    {"continue": "Researcher", "call_tool": "call_tool", "end": END},
)

workflow.add_conditional_edges(
    "call_tool",

    lambda x: x["sender"],
    {
        "Researcher": "Researcher",
        "Chart Generator": "Chart Generator",
    },
)
workflow.set_entry_point("Researcher")
graph = workflow.compile()

Finally, we prompt the system:

for s in graph.stream(
    {
        "messages": [
            HumanMessage(
                content="How was the temperature in San Fracisco  in january, \
            during the whole month?. Draw a line graph of it. \
            Once you code it up, finish"
            )
        ],
    },
    # Maximum number of steps to take in the graph
    {"recursion_limit": 60},
):
    print(s)
    print("----")

We will get this output, at first iteration:

{'Researcher': {'messages': [HumanMessage(content='I am unable to generate a chart or provide a line graph due to the lack of context and data about the temperature in San Francisco in January.', name='Researcher')], 'sender': 'Researcher'}}
----
{'Chart Generator': {'messages': [HumanMessage(content='', additional_kwargs={'function_call': {'arguments': '{"code":"import matplotlib.pyplot as plt\\n\\n# Data\\ndays = list(range(1, 32))  # January has 31 days\\ntemperature = [14, 15, 15, 15, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]\\n\\n# Plot\\nplt.figure(figsize=(10, 5))\\nplt.plot(days, temperature, marker=\'o\')\\nplt.title(\'San Francisco Temperature in January\')\\nplt.xlabel(\'Day\')\\nplt.ylabel(\'Temperature (°C)\')\\nplt.grid(True)\\nplt.show()"}', 'name': 'python_repl'}}, name='Chart Generator')], 'sender': 'Chart Generator'}}
----

Note that the Researcher does not have access to data yet:

I am unable to generate a chart or provide a line graph due to the lack of context and data about the temperature in San Francisco in January.

Then the system evolves:

{'Chart Generator': {'messages': [HumanMessage(content='', additional_kwargs={'function_call': {'arguments': '{"code":"import matplotlib.pyplot as plt\\n\\n# Data\\ndays = list(range(1, 32))  # January has 31 days\\ntemperature = [14, 15, 15, 15, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]\\n\\n# Plot\\nplt.figure(figsize=(10, 5))\\nplt.plot(days, temperature, marker=\'o\')\\nplt.title(\'San Francisco Temperature in January\')\\nplt.xlabel(\'Day\')\\nplt.ylabel(\'Temperature (°C)\')\\nplt.grid(True)\\nplt.show()"}', 'name': 'python_repl'}}, name='Chart Generator')], 'sender': 'Chart Generator'}}
----

 import matplotlib.pyplot as plt

# Data
days = list(range(1, 32))  # January has 31 days
temperature = [14, 15, 15, 15, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]

# Plot
plt.figure(figsize=(10, 5))
plt.plot(days, temperature, marker='o')
plt.title('San Francisco Temperature in January')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()

Finally we have the desired output:

{'Chart Generator': {'messages': [HumanMessage(content='', additional_kwargs={'function_call': {'arguments': '{"code":"import matplotlib.pyplot as plt\\n\\n# Data\\ndays = list(range(1, 32))  # January has 31 days\\ntemperature = [14, 15, 15, 15, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]\\n\\n# Plot\\nplt.figure(figsize=(10, 5))\\nplt.plot(days, temperature, marker=\'o\')\\nplt.title(\'San Francisco Temperature in January\')\\nplt.xlabel(\'Day\')\\nplt.ylabel(\'Temperature (°C)\')\\nplt.grid(True)\\nplt.show()"}', 'name': 'python_repl'}}, name='Chart Generator')], 'sender': 'Chart Generator'}}
----

 import matplotlib.pyplot as plt

# Data
days = list(range(1, 32))  # January has 31 days
temperature = [14, 15, 15, 15, 14, 14, 14, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 15, 15, 15, 14, 14, 14, 14, 14, 14, 14, 14, 14, 14]

# Plot
plt.figure(figsize=(10, 5))
plt.plot(days, temperature, marker='o')
plt.title('San Francisco Temperature in January')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.grid(True)
plt.show()

Isn’t it awesome ? To put Open Source + Proprietary models to cooperate, increasing efficiency, accelerating convergence and decreasing costs ?

BUT it’s not only flowers: Look what happen when I simply change the color of the plot: a different output 😢 Maybe because I didn’t define the year ?

This is not a problem of Gemma or GPT-3.5. It’s a problem of all LLMs. Slightly different prompts generate different outcomes. We really need to solve this issue if we want to put stable LLMs into production.

In this type of model, multi-agent, this is even worse, as we have several prompts, initially for the user, for the code execution, for the Researcher, for the Chart Generator, and finally, for the task that needs to be solved.

Present a stable solution ready for production is beyond the scope of this article, in fact my idea is to generate more questions than answers.

Let’s define the year:

with tracing_v2_enabled(project_name="Multi-agent Project"): ## LangSmith
    for s in graph.stream(
        {
            "messages": [
                HumanMessage(
                    content="How was the temperature in San Fracisco  in \
                    january 2023, during the whole month?.\
                    Draw a line graph of it. Once you code it up, finish"
                )
            ],
        },
        # Maximum number of steps to take in the graph
        {"recursion_limit": 150},
    ):
        print(s)
        print("----")