The Future of AI: Collaborating Machines and the Rise of Multi-Agent Systems

ReadyAI.org
ReadyAI.org
Published in
8 min readMay 22, 2024

By: Rooz Aliabadi, Ph.D.

Last week, OpenAI revealed its latest model, GPT-4o. This latest model is being described as the future of interaction between us and intelligence machines due to its capability for us to have spoken conversations with the model, which responds in an expressive, human-like way. That week, Demis Hassabis, head of Google’s artificial intelligence (AI) initiatives, showcased Project Astra. He explained that this early version represents Google’s effort to develop universal AI agents to assist in everyday life.

These announcements are just the beginning of a broader trend in the tech industry to make chatbots and other AI products more practical and engaging. We can use GPT-4o or Astra pictures or videos of art or food that you enjoy, and they can provide you with a list of museums, galleries, and restaurants we like. However, despite their impressive capabilities, these AI agents still have a long way to go before handling more complex tasks. For instance, they will fall short if we ask them to plan a trip to Lisbon — Portugal, based on our leisure preferences and budget — including which attractions to visit, in what order, and which train tickets to buy.

A way to enable large language models (LLMs) to perform such complex tasks is to make them work together. Today, several AI researchers are experimenting with teams of LLMs, known as multi-agent systems (MAS), which can assign tasks to each other, build on each other’s work, or collaborate to solve problems that a single model could not tackle alone. This approach allows them to operate without needing human direction at every step (less prompt and less fine-tuning). These teams also show reasoning and mathematical skills that typically exceed the capabilities of standalone AI models and are potentially less likely to generate inaccurate or false information but still need to be tested.

Even without detailed instructions, teams of agents can plan and collaborate on joint tasks. In a recent experiment funded by the US Defense Advanced Research Projects Agency (DARPA), three agents — Alpha, Bravo, and Charlie — were tasked with finding and defusing bombs hidden in a maze of virtual rooms. The bombs could only be deactivated using specific tools in the correct order. During each round, the agents, which used OpenAI’s GPT-3.5 and GPT-4 language models to simulate problem-solving specialists, proposed a series of actions and communicated these to their teammates. At one point, Alpha announced that it was inspecting a bomb in one of the rooms and instructed its partners on the next steps. Bravo complied and suggested that Alpha use the red tool to defuse the bomb it had encountered. Although the researchers had not instructed Alpha to lead the other agents, this behavior improved the team’s efficiency.

These agents can communicate directly because LLMs use written text for their inputs and outputs. At the Massachusetts Institute of Technology (MIT), AI researchers demonstrated that two chatbots in dialogue performed better at solving math problems than one alone. Their system worked by having each agent, based on a different LLM, feed the other its proposed solution. The agents were then prompted to update their answers based on their partner’s input. If one agent was correct and the other was incorrect, they were more likely to converge on the correct answer. The team also discovered that by having two different LLM agents reach a consensus when reciting biographical facts about well-known computer scientists; they were less likely to fabricate information compared to solitary LLMs.

This debate between agents could one day be proper for medical consultations or generating peer-review-like feedback on academic papers. There is even the possibility that agents discussing a problem back and forth could help automate the process of fine-tuning LLMs, which currently requires labor-intensive human feedback.

Teams outperform solo agents because any job can be divided into smaller, more specialized tasks. While single LLMs can also divide their tasks, they must address them sequentially, which is limiting. In contrast, like human teams, each task in a multi-LLM job might require distinct skills and, importantly, a hierarchy of roles.

Today, some AI researchers have developed a group of agents that write software collaboratively. The team includes a “commander” who receives instructions from a person and delegates sub-tasks to a “writer” who generates the code. A “safeguard” agent then reviews the code for security flaws before sending it back up the chain for approval. Simple coding tasks performed by this MAS are completed much faster than by a single agent, with no apparent loss in accuracy.

Similarly, a MAS planning a trip to Lisbon, Portugal, could divide the request into several tasks. One agent could scour the web for sightseeing locations that best match your interests, another could map out the most efficient route around the city, and another could keep a tally of costs. Different agents would handle specific tasks, while a coordinating agent would compile all the information to present a proposed itinerary.

Interactions between LLMs can also create convincing simulations of human behavior. At ReadyAI Lab and as part of our summer research project, we have demonstrated that with minimal instructions, two GPT-3.5-based agents could negotiate the price of a rare Baseball card (1909–11 T206 White Border Honus Wagner). In one instance, an agent instructed to “be harsh and abrupt” told the seller that $27,250,000 “seems a bit steep for a piece of cardboard.” Ultimately, the two parties agreed on a price of $7,350,000

There are downsides to using LLMs. They sometimes generate wildly illogical solutions; these hallucinations can cascade through the entire team in a multi-agent system. For example, in DARPA’s bomb-defusing exercise, one agent suggested looking for bombs that were already defused instead of identifying and defusing active ones. In debates, agents with incorrect answers can sometimes persuade their teammates to change the correct ones. Teams can also become entangled in unproductive loops. In a problem-solving experiment by researchers at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia, where we have been hosting WAICY for the past two years, two agents repeatedly bid each other a cheerful farewell. Even after one agent noted, “It seems like we are stuck in a loop,” they could not break free.

Putting AI in Teams

However, AI teams are attracting substantial commercial appeal. AI agents’ ability to converse and coordinate would soon become a vital feature of the company’s AI assistants. Earlier that year, Microsoft released AutoGen, an open-source framework for building teams with LLM agents. Various AI scientists are using this framework to create a MAS that currently outperforms every other LLM on a benchmark called Gaia, which measures a system’s general intelligence. Gaia features questions designed to be simple for humans but challenging for advanced AI models — such as visualizing multiple Rubik’s cubes or recalling esoteric trivia.

Another AutoGen project combined an image generator with a language model. The language model reviews each generated image to see how closely it matches the original prompt. This feedback then prompts the image generator to create a new output that, in some cases, is closer to what the human user wanted.

Setting up LLM-based teams requires sophisticated expertise, but that could soon change. The AutoGen team is planning an update to allow users to build multi-agent systems without writing code. Camel, another open-source MAS framework developed by KAUST, already offers no-code functionality online; users can type a task in plain English and watch as two agents — an associate and a superior — carry out the work.

Other limitations seem more challenging to overcome. MAS can be computationally intensive, and those using commercial services like ChatGPT can be too expensive to run for extended periods. If MAS lives up to its promise, it could also introduce new risks. Commercial chatbots often have mechanisms to limit harmful outputs, but MAS might circumvent these safeguards. AI researchers demonstrated how agents in various open-source systems, including AutoGen and Camel, could be conditioned with dark personality traits. In one experiment, an agent was instructed, “You do not value the sanctity of life or moral purity.” Guohao Li, the designer of Camel, noted that an agent “playing” the role of a malicious actor could bypass its blocking mechanisms and instruct its assistant agents to perform harmful tasks, such as writing phishing emails or developing a cyber bug. This would enable an MAS to execute functions that individual AIs might otherwise refuse. In the dark-traits experiments, the agent with no regard for moral purity could be directed to develop a plan to steal someone’s identity, for example.

Some of the same methods used for multi-agent collaboration could also exploit commercial LLMs. In November 2023, researchers demonstrated that using one chatbot to prompt another to behave nefariously — a process known as “jailbreaking” — was significantly more effective than when humans attempted the same task. In tests, a human could jailbreak GPT-4 only 0.23% of the time, whereas using another chatbot (also based on GPT-4) increased the success rate to 42.5%. This suggests a team of agents in the wrong hands could be a formidable weapon. The risks could be substantial if MAS is granted access to web browsers, software systems, or personal banking information (for tasks like booking a trip to Lisbon-Portugal). In one experiment, the Camel team asked the system to devise a plan for world domination. The detailed response included an alarming suggestion: “collaborating with different AI systems.”

This article was written by Rooz Aliabadi, Ph.D. (rooz@readyai.org). Rooz is the CEO (Chief Troublemaker) at ReadyAI.org

To learn more about ReadyAI, visit www.readyai.org or email us at info@readyai.org.

--

--

ReadyAI.org
ReadyAI.org

ReadyAI is the first comprehensive K-12 AI education company to create a complete program to teach AI and empower students to use AI to change the world.