Voice assistants and chatbots rely on understanding language, but for now, they can learn meaning and context only with human help. So much for privacy.
By Ben Dickson
In late August, Motherboard broke the news that Microsoft had shared voice recordings of Xbox users with contractors to improve the functionality of its AI-powered voice assistant. Earlier that month, another report revealed that Microsoft was sharing Skype recordings and Cortana voice commands with contractors to improve the chat platform’s services. Some of the recordings contained intimate content.
Microsoft is just one of several companies that employ human workers to listen and annotate user-generated data. In recent months, similar programs at Google, Amazon, Facebook, and Apple, which has established itself as an advocate of user privacy, have been revealed.
As voice-enabled assistants and chatbots become increasingly popular, tech companies are faced with a difficult challenge: Their AI algorithms are not made to deal with the complexities of human language, and they often fail at understanding the meaning of the commands and sentences their users say.
For the moment, the only solution is to hire humans to steer these AI algorithms in the right direction. And that often requires having those workers listen to the intimate conversations of users to transcribe and annotate them.
The Challenge of Understanding Language
Voice-based assistants such as Alexa, Siri, and Cortana owe their capabilities to advances in deep learning, a branch of artificial intelligence that has become very popular in the past few years. Deep-learning algorithms are especially good at finding patterns and classifying information.
When you provide a deep-learning algorithm with millions of voice recordings and their corresponding texts, it can transcribe new audio excerpts with very high accuracy. Deep learning is also good at prediction: When you train an AI algorithm on a large corpus of text, it develops complex mathematical representations of different word sequences and can perform tasks such as automatically completing sentences.
But deep learning struggles with understanding the meaning of words and sentences, a task that can’t be accomplished with pure math and statistics.
“Speech recognition and natural language understanding may sound like similar problems, but they’re actually totally different,” says Gary Marcus, cognitive scientist and founder and CEO of Robust.AI. “In speech recognition, you have a limited number of syllables and phonemes in your language, and you’re trying to translate an audio stream into something that belongs to a very small set of categories.”
The English language has tens of thousands of commonly used words, and in the age of big data, you can easily find millions of examples of each with which to train deep-learning models. But parsing sentences and interpreting their meanings is a whole different effort. There are countless possible sentences, each with a unique meaning. And the meanings of words vary based on where they are in a sentence and what precedes or follows them.
“Except for a few small sentences, almost every sentence you hear is original. You don’t have any data directly on it. And that means you have a problem that is about inference and understanding,” Marcus says. “The techniques that are good for categorizing things, putting them into bins that you already know, simply aren’t appropriate for that. Understanding language is about connecting what you already know about the world with what other people are trying to do with the words they say.”
In his new book, Rebooting AI (coauthored with New York University professor Ernest Davis), Marcus explains some of the challenges that face contemporary AI when it’s deciphering the meaning of human language. One of the things we take for granted is the general knowledge of the world that each of us has and how we use this knowledge to untangle the ambiguities of spoken and written language.
Everyday conversations are filled with such ambiguities. For instance, consider this sentence, which Marcus and Davis examine in their book: “Elsie tried to reach her aunt on the phone, but she didn’t answer.” This is a simple sentence. But it also contains several ambiguities that you, as a human, can easily resolve. Upon hearing the sentence, you will immediately know that “reach” means “to communicate” and not “physically reach out,” “on the phone” means “by using the phone” and not “physically on the phone,” and “she” is a reference to Elsie’s aunt and not to Elsie herself. These are all inferences you can make without a second thought because you know what a phone is, what it’s used for, and how the process of making a phone call works.
The Endless Training Cycle
Deep learning’s lack of common sense and knowledge of the world leaves tech companies with no other option than to keep training their AI models with more and more examples, hoping they will eventually cover all possible ways of saying the things their AI assistants should do. That’s why they need the help of human workers, usually remote and underpaid, who can evaluate the performance of their AI algorithms, or transcribe and annotate user recordings that AI algorithms fail to decipher.
But given the endless ways in which humans can express things, more training will end up being a Band-Aid solution. There will always be outliers, scenarios that the AI has not been trained to deal with; and human language is dynamic and constantly evolving. This all requires more training, which means you’ll hear more stories of remote workers listening to your private conversations.
“The weakness of the current technology is that it’s incredibly data-hungry, particularly in open-ended problems like natural language understanding. So the companies are desperate to try to get that data,” Marcus says. “I don’t think it’s really going to solve their problem anyway. It’ll help a bit, but it won’t solve it.”
Without a way to embed common sense and basic knowledge into deep-learning algorithms, there will be no short-term fix to the problem. As tech companies continue to collect and annotate user data to train their AI algorithms, they’ll face a backlash from privacy advocates and possible legal action from data-protection authorities. This has caused these companies to tone down and restructure their data collection and sharing programs-but not to halt them.
In the long term, Marcus believes, we’ll need new perspectives on AI: “We need better research into AI. That means shifting a culture that is mostly about data and math to a culture that also incorporates other ideas from other fields like psychology, philosophy and linguistics, that have thought pretty deeply about how the human mind works, and might lead to a richer set of techniques for building artificial intelligence than we’re really seeing right now.”
Originally published at https://www.pcmag.com on October 3, 2019.