Navigating Multi-Agentic Workflows for Problem-Solving

Published in

d*classified

8 min read2 days ago

Large Language Models (LLMs) and agentic workflows are useful for tackling more complex problems. Previously, our intern Amanda Koh, explored use of Chain of Thought in AI Assistant to utilise LLMs beyond information they are trained on. Tan Yan Chi, as part of her internship with C3 Development programme centre, built upon that work and explored how multi-agent workflows hold the potential to revolutionise complex problem-solving using AI. This project was supervised by Meo Kok Eng.

Background

Let’s start from the basics. For most non-technical users who have used any Generative AI (GenAI) applications like ChatGPT, Zero-Shot Chain of Thought (CoT) prompting (Figure 1), is likely a familiar concept.

Figure 1. Example of a Zero-Shot CoT Prompt

Zero-Shot CoT refers to a prompting technique that involves giving the LLM agent a problem to solve without giving it any example. The LLM agent uses its pre-trained knowledge with its understanding of natural language to generate an output which may or may not answer the question correctly. While this can be effective for simple single-turn tasks, Zero-Shot CoT might struggle with situations that require a specific outcome or involve complex reasoning steps, resulting in largely hallucinated responses.

For users tackling more intricate problems, Few-Shot CoT (Figure 2) offers a more targeted approach.

Figure 2. Example of a Few-Shot CoT Prompt

In Few-Shot CoT, we provide the LLM with a limited set of examples alongside the problem itself. This allows the model to leverage its pre-trained knowledge while also learning from the specific context of the examples. However, unlike traditional training methods, Few-Shot CoT does not modify the LLM’s internal parameters. The model remains “frozen” but leverages the provided examples to inform its reasoning process. Yet, this approach proves to be ineffective for tasks with limited or irrelevant examples, or those requiring complex, multi-step reasoning.

For complex, multi-step reasoning problems, multi-agentic frameworks might just be the key to produce quality outputs. But what exactly is a multi-agentic framework?

AutoGen

Imagine a group chat where LLM agents, each with specialized skills, collaborate to solve problems. That’s the core idea behind multi-agentic frameworks.

Microsoft’s AutoGen is the pioneering multi-agentic framework where users only need to define a set of agents, each with its unique capabilities and role. This intuitive setup allows even non-technical users build powerful multi-agentic systems. When a user submits a query, the agents work together iteratively until they reach a satisfactory solution, or a predefined iteration limit is reached.

Behind the scenes, AutoGen orchestrates the entire process. It optimizes the workflow between the agents for end-users, freeing them to focus on the output. However, a potential hurdle arises when AutoGen gets stuck in a loop. This “forever loop” scenario involves a LLM agent repeatedly receiving similar questions and producing similar outputs, essentially making no progress until the limit is reached. This can be especially costly if you’re using a paid model like GPT-4, as it can lead to unexpected bills accumulating quickly for each agent call.

An effective strategy to mitigate this issue is to implement a low iteration limit. By establishing a low iteration limit, the loop will automatically terminate once the specified number of iterations is reached. This approach effectively prevents the scenario of accumulating unnecessary costs without yielding any meaningful results.

LangGraph

Another approach to address this involves manually specifying each transition within the workflow. Instead of a group chat scenario where any agent can be summoned at any time, end-users must designate which agent interacts with which other agent and determine the sequence of these interactions. LangGraph serves as an ideal low-level framework for crafting a customized workflow for a multi-agentic system, offering the capability to define each transition between agents. With LangGraph’s compatibility with LangSmith (Figure 3), it is extremely easy to control and monitor the workflow, allowing users to stop unnecessary loops before it gets too late.

Figure 3. Monitoring Progress on LangSmith Web Application

Before creating a workflow with LangGraph, it is essential to understand its fundamental concepts: state, node, and edge.

State is the core component of the LangGraph framework. Every agent within the system shares and updates a unified state structure throughout the workflow. This shared state structure is pivotal as it ensures that all agents operate in a synchronized and cohesive manner.
Node in LangGraph represents each LLM agent in the multi-agentic system. A graph would consist of multiple nodes connected together, forming a workflow.

Edge is the connecting lines between nodes. They define the flow from one agent to another. There are two types of edges: normal edges and conditional edges. A normal edge routes the control flow from one node to the next directly, without referencing the graph state. On the other hand, a conditional edge routes to different nodes depending on the current graph states, providing dynamic control based on the evolving context of the workflow.

Since the launch of LangGraph, numerous workflow patterns have been created, each optimized for specific use cases and operational contexts. I experimented with the “Plan-and-Execute” workflow as well as the “Reasoning Without Observations (ReWOO)” workflow.

Plan-and-Execute Workflow

The Plan-and-Execute workflow is designed to achieve objectives through a systematic approach involves coming up with an overarching plan, followed by the execution of specified subtasks in the plan. This workflow is orchestrated by three distinct agents: the main planner, the action agent, and the replanner (Figure 4).

When a user submits a query, the main planner devises a comprehensive plan to address the query. Following the formulation of the plan, the action agent undertakes the responsibility of executing each step. This agent employs iterative loops to ensure that each subtask is completed with high-quality outputs.

Upon the completion of each step, both the plan and the outputs generated by the action agent are forwarded to the replanner agent. The replanner agent then reevaluates the current plan and, based on the latest outputs, formulates a revised plan with fewer steps. This iterative process of planning, executing, and replanning continues until all steps are exhausted, and the final output is delivered to the user.

Figure 5. Code snippet of Graph Creation iin LangGraph for Plan-and-Execute Workflow

The Mistral-7B model was used to test out this workflow.

The underlying philosophy of this workflow is to maintain the LLM on course by fragmenting a complex task into simpler, manageable subtasks. With reference to Figure 4, one single run of this workflow for a complex problem with N-Steps plan would minimally take 2N + 1 calls and potentially more if the action agent performs multiple iterations. While this method enhances precision and ensures that the task remains aligned with the intended objective, it inherently necessitates a greater number of individual LLM queries, consequently resulting in higher latency and higher cost as compared to the ReWOO workflow. Refer to screenshots below to compare cost price between calls from Plan-and-Execute workflow and ReWOO workflow.

ReWOO Workflow

The Reasoning Without Observation workflow, or ReWOO workflow (Figure 6), is an enhanced version of the Plan-And-Execute workflow. This approach is also based on the same idea of breaking down complex tasks into simpler, manageable subtasks. However, in this approach, the prompt to the main planner is designed to request the plan in a format that specifies both the agent to use and the input required for the agent. This is done so by including an example in the prompt to provide in-context learning for the model (Figure 7).

Figure 7. Code Snippet of Graph Creation in LangGraph for ReWOO Workflow

By doing so, each action agent can execute its task independently without needing to call back the replanner, significantly reducing the time required for task completion. The output from each action agent is then fed back into the solver agent, which synthesizes the final output for the user.

Figure 8. Code Snippet of Graph Creation in LangGraph for ReWOO Workflow

With reference to Figure 6, one single run of this workflow for a complex problem with N-Steps plan would minimally take N + 2 calls and potentially more if the action agent performs multiple iterations. This workflow offers a substantial improvement over the Plan-And-Execute method by eliminating the need for continuous interaction with the replanner LLM, thereby saving both cost and time (Figure 9).

Figure 9. LangSmith Stats on Cost Price of Same Run Counts for Plan-and-Execute and ReWOO workflows

From a Wider Perspective

When we zoom out to look at the bigger picture of AI agents, we observe varying degrees of control and independence. On one end, we have LangChain, which allows the construction of AI agents capable of utilizing multiple tools but lacks the ability to coordinate multiple chains or actors across steps. On the other end, there are fully autonomous agents that can handle all planning and tool selection without any user input.

LangGraph and AutoGen occupy the middle ground between these extremes. LangGraph provides structure and state, enabling the creation of agents with controlled and monitored steps. This framework is ideal for production environments where precision and oversight are crucial. AutoGen, positioned between LangGraph and fully autonomous agents, allows agents to make more independent decisions during task execution. While AutoGen is less suitable for production due to its reduced control over specific next steps, it excels in testing and ideation phases. Converting AutoGen workflows into LangGraph for production ensures better control and monitoring, preventing redundant operations and optimizing resource use.

Conclusion

Navigating the complexities of problem-solving using AI requires innovative approaches that go beyond traditional single-agent models. The emergence of multi-agent workflows such as AutoGen and LangGraph, heralds a new era in AI-driven solutions. These frameworks enable more sophisticated, efficient, and cost-effective workflows by leveraging the strengths of multiple specialized agents working in concert. By understanding and leveraging the appropriate level of autonomy in AI frameworks, we can achieve a balance that optimizes both innovation and control, paving the way for more effective and reliable AI Assistant solutions.