Alexa, Fake Laughter, and Real Fear
On Voice Interfaces and the Importance of Context
It’s late in the evening. You’re home alone, warming up some leftovers, and all of a sudden you hear the sound of maniacal laughter trickling in from your bedroom, making the hairs on your neck stand on end. A gleeful burglar? A deranged murderer? You peek around the corner, heart-in-throat, to see: Alexa. Her blue lights are swirling and she’s pointlessly, oddly, creepily chuckling to herself.
For the past few days, Alexa owners have been reporting untriggered laughter emerging from their devices, documenting their experiences across social media. Of all of the emotions Alexa regularly provokes — from joy to rage — complete and utter fear is a new one. How did this happen?
According to an Amazon spokesperson, the laughter is meant to be triggered exclusively by the command “Alexa, laugh,” but because it’s such a short utterance, it’s easy for the assistant to mistake a wide range of statements for the prompt to chuckle. And unfortunately, Alexa’s unasked-for merriment has come at rather inopportune times. One user reported her laughter erupt in the middle of a confidential conversation he was having about work-related issues.
In response to this debacle, on March 7 Amazon issued a statement that they are changing the original utterance to “Alexa, can you laugh?” a prompt less likely to produce false positives. To put the nail in the coffin, they also added the response, “Sure, I can laugh” before any actual tee-hee-ing begins, so users won’t (presumably) jump out of their skin.
The solution seems like a straightforward one. We want the laughter to happen, well, only when we intend it. So making it more difficult for unasked laughter to happen makes sense, right?
Not exactly. Matching user intentions to explicit commands is a complex business, and simply changing the command after identifying a failure in the experience is indicative of a problem in the way we are designing conversational interfaces.
Because ultimately, the issue isn’t that Alexa isn’t hearing us correctly. The issue is that we aren’t designing conversations that people actually want to have with their assistants. And as a result we aren’t providing meaningful experiences for users.
The numbers speak for themselves: Despite the fact that, as of November 2017, 8.2 million people owned an Amazon Echo device, and despite the fact that there are currently almost 25,000 skills in the Alexa store, the fact remains that 97% of voice applications go unused after the first two weeks. If we don’t build desirability into these experiences from the bottom up, we can’t expect assistants to become embedded in our day-to-day lives.
So, instead of troubleshooting how we can simply prevent laughter in the midst of a serious work conversation, how can we encourage Alexa to read the room, so to speak, and contribute accordingly? Can we program our devices to track changes in human pitch, volume, and speed (in addition to words and sentences) in order to better deduce the situation? How might these modulations trigger a better, more nuanced type of response? In short, how can we design contextually, so that a CUI’s responses start aligning closely with user needs?
If we don’t build desirability into these experiences from the bottom up, we can’t expect assistants to become embedded in our day-to-day lives.
This is a matter of beginning the design process with research and discovery activities—activities geared toward better understanding the human contexts in which various voice features may be of value. Ultimately, we can imagine our sensitivity to context reaching the point that users don’t have to ask for what they want — when you tell a joke, your assistant laughs.
When we think contextually, we begin to design voice assistants that are robust, thoughtful articulations of our expectations: we design assistants that actually assist.
We might be a long way off from designing contextually. We might be stuck in the Alexa-makes-some-tasks-marginally-better for a while, or treating our assistants as grab-bags of party tricks or frustrating anecdotes over the water cooler. But by acknowledging what is holding us back — not the correct or incorrect utterance, but the actual experience itself — we can start to make strides in the right direction. And hopefully our devices will stop laughing at us from the next room over. Hopefully.