Is tool-calling all you need? Interaction patterns in multi-agent systems. Part I: Introduction.
For all the hype that large language models (LLMs) such as GPT-4 have been getting lately, anyone who tried to use it knows that it’s fairly limited in a number of ways. One of these limitations is that an LLM on its own just receives a collection of inputs and responds with a single output, based only on the inputs and the data used during its training.
To allow it to use other data and services, it’s common to embed it in a wrapper (called, for example, “agent executor”) that allows the LLM to request data from the outside world, and get back the results — this is called an AI agent; ChatGPT is such an AI agent, for example it will search the web if you ask it to, and respond based on the results.
AI agents can do more than naked LLMs, yet even they are limited. One of their main limitations is that while an AI agent can keep requesting new data, it is still always using the same prompt (set of instructions) that it was created with. And while technically these days there is almost no limit to the size of the prompt (and thus the number of distinct instructions) that one can give to an agent, in practice the bigger the prompt, and the larger the number of instructions in it, the higher the chance that the LLM inside the agent will simply ignore some of them, or interpret them wrongly.
The way to deal with this is to split the instructions into many distinct prompts, each to be used by a separate agent. This can work great, but then the question arises of how to orchestrate the interaction of those agents with each other? Here orchestration means the organization of the way that the agents interact with each other to solve a common task.
As this field is so new, different multi-agent frameworks use very different ways of doing that. The purpose of this series is to review these, see which of them are genuinely distinct and which are equivalent to each other. In the final part of the series, we will also explain the orchestration options we choose to offer in motleycrew, and why we believe they strike the right balance between power and simplicity.
Tool calling
The most basic pattern of an AI agent’s operation is tool calling. Here a “tool” refers to anything that an agent can make a request to, that will respond with information that can be fed back to the agent. The most basic example is a web search engine that can respond to a text query supplied by the LLM, or a database that can return the response to an SQL query. Here is a simplified version of the diagram from the previous section:
However, tools needn’t be so simple. In fact, an agent also receives an initial query or task and then returns a result, so it can be used as a tool by another agent! And of course, the agents used as tools could themselves have yet other agents as tools, etc.
To the best of our knowledge, motleycrew was the first open-source framework to explicitly support the usage of agents as tools for other agents (though of course some earlier frameworks use similar concepts, as we will review in the following parts of this series).
We’ve found agents-as-tools to be a surprisingly powerful concept, which is sufficient for a large number of real-life usecases. However, for other usecases different orchestration methods are needed. Before we discuss what we think the missing pieces are, let us review the other popular multi-agent frameworks and their orchestration methods in the next two parts of this series.