AI101: The Multi-Agent Problem
A version of this post originally appeared on VentureBeat.
Personal Assistant Bots Like Siri and Cortana Have a Serious Problem
Assistants are trending big these days. Everyone, from Apple to Amazon, is introducing an assistant, or are busy developing one. People imagine them becoming the new interface to computers — why bother with apps and searching the web when you can just ask your assistant to do it for you?
But a major challenge stands between the ideal assistant and the current reality. It’s called the multi-agent problem, and most companies are reluctant to talk about it. The solution will ultimately determine how much of an impact assistants will have.
How Assistants Work
Assistants have a dirty little secret: They don’t actually understand you. At least, not in the way you might think they do.
In order to make them easier to deploy and reuse, early developers of assistants designed them to be not just one large program but many small ones — each one specialized in completing a particular task, such as booking an appointment, calling a car, setting an alarm, and so on. They called these dedicated programs “agents.” The assistant itself doesn’t have to know anything specific about these tasks, it just interprets the words of users and picks the best agent for the job.
Almost every modern assistant on the market today has copied this approach. Some assistants, like Amazon’s Alexa and Microsoft’s Cortana, farm out agent development to 3rd party developers. (Amazon calls them skills, Microsoft calls them bots.)
There are some serious drawbacks to this approach.
Since all of the expert, task-oriented knowledge is trapped inside of the agents, the assistant itself is left with virtually no understanding of the meaning behind your words. Instead, it scans for patterns — keywords and phrases — and produces its best match. If the user’s question contains words that the assistant doesn’t recognize, the assistant simply ignores those words. Words the assistant doesn’t know can’t be important, because the assistant doesn’t know them. This is why assistants are remarkably easy to trip up; they don’t actually understand the meaning of your words. You may have noticed that they’ll occasionally do radically different things when you alter your commands ever so slightly.
It’s really difficult for today’s assistants to handle tasks that require activating more than one agent. For example, if you ask your assistant to help you find a good place for brunch and ask it to call you a car in the same sentence, it’s not clear which agent is best suited to handling the job — the restaurant-finding agent or the car-calling agent.
Things get harder still when a company opens its assistant to outside developers. Now the assistant has to distinguish between dozens of agents, each claiming to be the best at handling a particular task. For instance, if an assistant has agents for Yelp, Foursquare and TripAdvisor, how does it determine which one should help you find a place for that special date?
This is the crux of the multi-agent problem: How does an assistant, with limited knowledge of the world and a limited set of isolated agents, and agents competing for the service — choose which one to activate for every command in a way that will satisfy users?
Working Toward the Solution
Early assistants, like Siri and the first versions of Alexa, worked around this problem by carefully curating the agents, keywords, and phrases they understood. Like a magician carefully arranges a trick to make you think you saw something you didn’t, thoughtful designers created the illusion that these assistants were capable of a lot more than they really were.
Now, as people expect more and more from their assistants, there is pressure to open them up to outside developers. This makes the multi-agent problem unavoidable.
Alexa and Cortana both solve the issue in part by forcing the user to decide which agent to use. (“Alexa, ask Dominos to send me a pizza.”) Apple, true to form, is taking a conservative and measured approach by allowing a very limited set of agents. So far, Apple’s agents focus on handling reservation bookings and car hailing.
Developers in the AI space are hoping that more sophisticated natural language processing (NLP) or machine learning will bring an answer. Microsoft, Google, Apple, and Viv are all making major investments in these areas. Still others are trying to go further by giving the assistant more knowledge about the world. Ozlo, my own assistant, looks directly at the data inside of agents to try to improve its understanding.
It’s not clear what will work, or which solution will ultimately prove the winner. But we can look back to the early days of web search for hints.
Early search engines took a similar approach to today’s assistants. Rather than peering directly into each web page, they relied on web page descriptions provided by the authors — so-called metadata. For example, if you were building a website about dogs, you might put keywords in your metadata like “dog,” “canine,” and “pets.” Search engines would show results solely based on the words present in the metadata.
It didn’t take long for a multitude of sites to claim “best in show” for all categories of information. As the web grew, less scrupulous website authors filled their metadata with keywords that had nothing to do with their page, just to draw more traffic.
Eventually, Google solved this problem by taking the additional step of actually reading the contents of the web pages themselves, sometimes ignoring the metadata altogether. Google’s search algorithm values web page content in the same way people value web page content. Only then did web search begin to approach the universal quality that we’ve grown to expect today.
While there are parallels, solving the multi-agent problem is not a simple replay of web search. The user requirements, technology, and even the data involved are radically different. But it does seem likely that until assistants begin to understand the tasks they are offering to users, it will be hard for them to meet the high hopes we all have for this category.
Only time will tell. This category is still very young and is still evolving. But look carefully for the companies solving this problem head-on; they are the ones likely to dominate the next wave of intelligent assistants.