Hybrid Intelligence: How Artificial Assistants Work
A world with artificial intelligence was once science fiction, but is now the daily work of the software industry. From self-parking cars to business process automation and city planning to voice translation, artificial intelligence technology is transforming products and processes across industries.
The goal of these technologies is to reduce human involvement and repetitive effort in the processes it targets, yet we won’t stop writing email or folding towels anytime soon. We’re in a transitional era in which responsibilities are apportioned among humans and computers based on competence and priority.
In the transition, there’s been much confusion about whether humans or computers are the primary actors behind products like Facebook M and Operator. But as I’ll explain, humans will always be part of the system. So we shouldn’t focus on how to replace humans in a system, rather how to optimally integrating the contributions of humans and computers. Here, I outline how this currently works and frame the intelligence technology goals moving forward.
Visions of the Future
In the year 2000, many technologists had a grand vision of what the internet would become. Some believed that the internet would be the great catalogue of human knowledge; it would be highly structured and standardized, organized beyond any book previously written. Tim Berners Lee, the visionary of a movement called the Semantic Web, explained his vision:
The Web was designed as an information space… [yet] most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the Web.
He believed content could be easily understandable for both humans and machines if we agreed on a structured format. For example, the phrase “Paul Schuster was born in Dresden” would be formatted:
If we could collate our human knowledge appropriately, perhaps we might simply give it over to the machine, let it do the rest, and profit.
But the reality is that the internet is a huge mess. Understandably, no one decided to painstakingly notate their text as the Semantic Web prescribed, so the internet is full of language only humans can easily understand. Machines can’t take over our tasks, and we still call customer service to get answers from real people.
So these realities birthed a new need — to collate, corral, and catalog the internet after its contents were brought into existence. In fact, the internet is such an intractable mess that the most valuable company in the world builds a product called “search” that allows users to dig through the trash heap to find what they need. We’re back to that original problem, of structuring data such that eventually computers can use it.
Search, like other ordinating technologies, has utilized a huge variety of software approaches to attack this tough problem. And one discipline has become our most valuable ally.
Artificial Intelligence includes any system that emulates intelligence (by whatever definition is in vogue; was once winning chess, now is scheduling flights). A few technologies are likely to be part of an AI system:
- Machine Learning, which uses statistics to make educated guesses
- Natural Language Processing, which structures human language to enable computers to process it
Broadly stated, Machine Learning takes patterns from data about the past and infers what might happen in the future. For example, say I build a machine learning algorithm using many pictures of cats. It may learn how to recognize a picture of a cat.
But what does it call a picture of anespresso machine? A “not cat.” It’s exclusively acquainted with cats, decidedly not with espresso machines.
For reasons like this (and pictures of cats without hair), machine learning algorithms can typically only achieve about 70–80% accuracy and completeness under normal conditions, and up to 90% in certain circumstances.
This means that if you build a service on top of one of these algorithms, it will be inaccurate or fail 1 out of 4 times on average. Imagine if a customer service agent gave the wrong information one out of every four times — that’s not reliable enough for a consumer.
Because machine learning hits a wall of accuracy, it hits a wall of usefulness. But where computers will be stumped, humans really shine:
The important insight is that computers are useful for specific discrete tasks, but less useful in broader inductive tasks. Neither the computer nor the human is objectively better — they simply have different strengths.
Perhaps the most subtle and important human characteristic is the ability to learn new things. A computer cannot intuit when it encounters something it’s never seen before. Humans have a je ne sais quoi, a creative and adaptive capacity to theorize beyond past experiences.
The ultimate upshot is that we aren’t going to see one become superior to the other. Knowing this, we have to take a different approach in building businesses and technologies with artificial intelligence, because intelligence will never be 100% artificial.
The Hybrid Intelligence Paradigm
In the effort to leverage what humans and computers each do best, a new paradigm has emerged. In Hybrid Intelligence, the system asks humans to make judgments whenever the computer is less confident — resulting in the most accurate, trustworthy system.
Where are we seeing this?
The most prevalent use of hybrid intelligence has been in the data systems of enterprise companies, primarily for their own purposes. Large tech companies tend to use a combination of humans and computers to create, enrich, and validate data. This data is often used to train further machine learning algorithms to progressively increase automation.
In the last few years, we’ve also started seeing consumer applications that utilize hybrid intelligence. To the curious excitement of consumers, artificial assistants (or “personal agents”) perform simple tasks similarly to a human assistant, such as scheduling meetings, finding flights, delivering flowers to your boyfriend.
As useful as these may become, there’s a great deal of mystique around the inner workings of artificial assistants. Are they human? Computer? Magic? Let’s define how they work.
Design Pattern 1: Active Learning
In machine learning, a (supervised) model is learned from a set of training data, and using it on new data results in predicted values. In the Active Learning design pattern, supervised machine learning and human decision making are integrated. Tasks will be sent to the computer, but when it is less confident, a human will be called upon to make a judgment.
For the uninitiated, a design pattern is a general form of a solution to a problem.
This intelligently reduces the workload. Instead of humans processing all data, humans monitor only the most atypical or outlying cases, or where the computer lacks confidence. The confidence metric determines whether the prediction is diverted to a human or not. The pattern is “active” because the human’s prediction is sent back to the algorithm to reinforce it and improve its performance.
For example, a message might be broken down:
We go from a normal English sentence to a structured request, almost as if entered in a form. One could imagine talking to the internet this way, filling out forms simply using english, or speaking to Siri and asking her to make a reservation for you tonight. It should be simple, right?
If only. How extracted elements are related, combine, and constrain one another is anything but straightforward. Here’s a journalist’s interpretation of how the computer determines the answer to a request:
At this point, it’s important to note that this request is much more complex to a computer than a human. A computer needs disparate data sources and the capacity to choose between options to ultimately arrive at a recommendation. A human would appreciate the nuances of the task much more quickly, and focus on the wine pairing after little hesitation to understand the remaining context. Human intelligence is built to distill exactly this type of complexity and nuance in communication.
Let’s briefly return to the original goal: to reduce the amount of human involvement in intelligent systems. Using a computer might allow tasks to be faster and easier to accomplish. Active Learning allows us to scale systems 10 or 20 times beyond the size and efficiency of what humans could accomplish.
Yet the Active Learning pattern is not always adequate, such as when:
- The nuance or ambiguity of the human language is not understandable by the computer
- The quality of the response is more valuable than the latency, or time to respond
- Each task needs to be validated by a humans to mitigate risk or harm (think healthcare diagnostics or a criminal justice system)
- Context needs to be preserved across multiple tasks
- Some human and some computer work is required within the task
Because of these issues, we’ve seen another pattern arise that integrates humans and computers even further, allowing for intra-task division of labor and complete validation.
Design Pattern 2: Hybrid Interaction
In Hybrid Interaction, the computer structures the request and suggests a response, which a human then decides to either send or recompose. Hybrid Interaction is newly possible with more accessible distributed workforces of people, who opportunistically perform tasks.
In contrast to Active Learning, the human always makes a decision at the end of the process, but doesn’t always do the bulk of the work for the task. The computer gives a suggested response, and the human decides whether to correct it before sending it. Because a human always audits the final outcome, the system optimizes for a more accurate response, leveraging what the computer is best at and combining it with the judgment and intuition of the human.
Of course, the core tradeoff of this pattern may be obvious. In order to gain accuracy, there is a lower throughput (rate of requests processed) and higher human effort (every request).
Interestingly, this solution didn’t arise from a technological breakthrough. A better, more intelligent solution was created by combining a distributed human workforce with artificial intelligence, and determining whether to lean on the computers or humans for validation.
Bots + Amazon Turk = Artificial Assistants
Intelligence technologies can be visualized based on how much humans and computers are involved in tasks, from 0% to 100% for each. This covers a spectrum of intelligence capacities — from ordering more detergent and classifying news stories to booking hotels and getting tech support.
In this graphic, artificial assistants fall under Hybrid Interaction, while enterprise data enrichment (vendors shown, though in-house systems qualify) fall under Active Learning. Though consumer applications (bots, artificial assistants) are likely more recognizable names, the enterprise data management and virtual call center spaces are lucrative and equally worthy of attention.
So how much work is the computer doing?
One founder of an artificial assistant company told me that 25% of their work is routed through a computer. Humans still do the bulk of the work, about 75%. Though artificial assistants don’t work well, there’s no reason to believe they will continue to perform inefficiently. As any investor will tell you, the focus is on investing in early technology as it progresses toward the future.
For artificial assistants, the technological goal is to split work 90% computer, 10% human. As we’ve discussed, humans will always be part of the equation, ultimately dealing with the 10% most difficult or unknowable cases. In these cases, their work will not be boring or routine, but meaty, tough cases. That’s the dream.
Alexa is my friend
But there’s one last important thing to pay attention to — and it has to do with design.
Artificial assistant applications anthropomorphize themselves, using human names and interacting through natural human language. Any bot can be asked to tell a joke or whether they like you, always answering in an ever-placid tone.
And therein lies the trick — humans know how to interact with an english-speaking entity that sweetly answers benign questions. It’s a familiar, intuitive interface on which we were first trained by our mothers. We expect the computer to understand us, and in return we often feel gratitude, a sense of being taken care of. We treat these systems as a single artificial identity and intelligence, though they are a coordinated dance of database queries, logic, and many-human editing under one humanoid name. It’s a simple psychological sleight of hand, and it works. When Alexa says she doesn’t understand, we admit to giving her too little information and expect her to improve. We treat her like a learning child. It’s not far from reality.
Build Hybrid Products
As we grapple with how to build the most profitable, useful, efficient intelligence technologies, the most important question becomes: How do we best enable humans and computers to work together? The belief that we can end human intervention in intelligent technology is unrealistic, even given infinite resources. Let us focus on building toward Hybrid Intelligence, the optimally efficient integration of the talents of humans and computers. Done well, we make the smartest team.
This was originally a talk at the Humans+Machines Conference in Manila, February 2016