Alexa, Do I Understand You?
Using language to make sense of virtual assistants
I was twelve when my dad brought home our first, shiny home computer. It felt like the world was at our fingertips: Surfing the worldwide web! Chatting on AOL! Sweet, sweet Minesweeper! We plugged it in. Fired it up. Stared at the progress bars. And then, of course, it promptly didn’t work.
In this, my first-ever human-computer interaction, I spoke with a computer:
“You sonofa — I did double click!”
At the time, cursing at a machine seemed harmless. But things have changed. Now some understand.
And now, some can talk back.
Voice User Interfaces (or VUIs), the most famous of which are virtual assistants like Siri, Google Assistant, and Alexa, are no longer the stuff of science fiction. They’ve become standard issue in your phone and are steadily entering your home. Amazon’s Alexa, for example, sits on the counter and responds to your voice to hire an Uber, make grocery lists, and even tell an occasional joke. And — aside from important caveats — she understands pretty well. So well, in fact, that some people consider her a close personal friend.
But wait. Notice the language that I used there?
Do you describe Alexa as a ‘she’ or as an ‘it’? Does Alexa ‘understand’ your words like a human or ‘process’ them like a computer? And can a virtual assistant really be a ‘friend’? Interacting with VUIs like Alexa is such a novel experience that you don’t have a normal, routinized way of talking or thinking about it yet. This makes your experience an enormous grey area, one that can reveal a great deal about human cognition when interacting with new technologies.
We researchers can use language as a window into your perceptions of these VUIs, examining the ways that language about VUIs influences your thoughts and attitudes, but also how your interactions with VUIs impact your language and everyday behavior.
In other words, we’ve spent the past several decades working to make computers work — from properly booting up Minesweeper to actually communicating with us — but many of us in the field agree:
the time has come to look at the human side of this human-computer interaction.
And that’s exactly what the Cognition, Language, & Interaction with Machines Research Group, a group consisting of myself and an international team of researchers centered here at the University of Basel, are doing.
From shoebox to running
Starting with the IBM Shoebox in the early 1960s, researchers have been developing machines that can process human speech. In more recent years, VUIs like Alexa and Siri have popped up in multiple forms, in everything from dictation tools to language-learning programs. But arguably the most successful VUIs are those that are coupled with some sort of artificial intelligence, or can at least convert what the user says into an actionable response. These VUIs have been given the name virtual assistants to emphasize how they can help with your day-to-day activities.
Launched on the iPhone 4s in 2011, Apple introduced the world to Siri (purchased from a company that had developed the application), which is still one of the most prolific VUIs on the market today. However, Google, Microsoft, and Amazon are also in the game.
Amazon released the Amazon Echo to the general U.S. public in 2015, a smart speaker for your home or office that houses the virtual assistant Alexa. The Echo is a good example of the growing esteem of VUIs, as it has an average 4.5-star rating out of nearly 30,000 reviews posted to Amazon.com.
This raises some important questions, like What can this thing possibly do to get such high reviews?, and What, for the love of all that is holy, is reviewer number 30,000 writing a review for? We understand — you like it! (Actually, many of the reviews are quite heartwarming.)
She’s a mystery to me
Let’s start with an even more basic question: What exactly is a virtual assistant? Its function is clear. You ask for directions to the closest coffee shop, and it tells you the answer. Tod’s Coffee is just around the corner. Okay. I get it. But a deeper question, cognitively speaking, is how do you process this interaction? How do you perceive or describe the thing that’s providing this information?
Well, it sounds like a woman, so you call it ‘she’. That, in and of itself, is pretty interesting. Then it followed your request, so it must have ‘understood’ you. But did it really understand you, or did you just ascribe it a cognitive ability because you were already calling it ‘she’?
If you do think of it as understanding, it’s possible that you make other inferences with regard to its ‘cognition’. If it can understand, perhaps it can think. Perhaps it can sympathize. Maybe it can dream of electric sheep.
Of course, the use of personifying language, or applying language that is usually reserved for humans to objects, is nothing new under the sun. People refer to everything from their cars to vacuums as ‘she’, ‘him’, or ‘stupid’. My last car was ‘The General’, and I saluted him.
But, because virtual assistants bear such striking human-like features, such as the apparent ability to communicate, they represent something very different than other everyday objects. With ‘The General’, there was no doubt that he was an object. With virtual assistants, this is not so clear. The big question, then, is the extent to which language of personification reveals your underlying perceptions of virtual assistants and, in turn, whether language influences those perceptions.
Language of personification
To begin answering these kinds of questions, we in the Cognition, Language, and Interaction with Machines Research Group have used natural language processing algorithms to break down the afore-mentioned 30,000-or-so reviews of the Amazon Echo.
As you can see from the figure below, we’ve found the relative frequencies with which people refer to the device with its personified name, ‘Alexa’, compared to its non-personified name, ‘Echo’. We’ve also found how frequently reviewers use the word ‘she’ or ‘her’ compared to ‘it’ to describe the product, where ‘she’ and ‘her’ are taken as evidence of personification and ‘it’ is not. Since names and pronouns are just two examples of how people might personify the device, we’ve also taken into account more nuanced forms of personification. For example, without going into too much dizzying detail, everything from position in a sentence (Subject vs. Direct Object) to the chosen verb may strongly indicate the extent to which a user is ascribing the Echo human-like qualities.
Using these frequencies, and following up on previous research on the topic, we’ve confirmed that there’s a correlation between a reviewer’s use of personification and their star rating of the Amazon Echo, such that people who personify the product by referring to ‘Alexa’ or ‘she’ are more likely to give it a higher rating than those who refer to ‘Echo’ or ‘it’.
This is pretty amazing, because it means that people who personify Alexa, as evident in the language that they use, like Alexa more than those who do not personify.
But correlation is not causation! In other words, we don’t know whether people use the word ‘she’ and then subsequently fall in love with the product (a la Joaquin Phoenix in Her), or if they gradually start to like the product and begin to use the word ‘she’.
To start teasing all this apart, we are presently working on a study that we piloted with the Language & the Mind course here at the University of Basel. Disguised as market research, the real purpose of the survey was to determine whether or not language of personification influences the way people think about Siri’s features. 64 Apple product owners were asked to rate the importance of different features of Siri on a scale of 1 to 6, where a rating of 1 meant ‘least important’ 6 meant ‘most important’.
The trick was, each participant received a different survey, and each survey used different descriptions of Siri. Some people took a survey where Siri was maximally personified with the pronoun ‘she’ and verbs like ‘understand’, and others took a survey where Siri was minimally personified with the pronoun ‘it’ and verbs like ‘process’.
As seen in the figure below, preliminary results indicate that participants judged Siri’s features as more important when Siri was maximally personified. So, for example, participants rated ‘She understands what I say’ as more important than ‘It processes what I say’ (despite these two sentences being, presumably, nearly identical in meaning).
Moreover, participants rated Siri’s features as more important when the product was maximally personified across several different dimensions. Just as a few examples, these dimensions included Siri’s cognition (‘understand’ vs. ‘process’), capabilities (‘ability’ vs. ‘function’), interaction (‘tells’ vs. ‘reports’), language (‘speaks’ vs. ‘uses language’), properties (‘lives on phone’ vs. ‘installed on phone’), and social role (‘obeys commands’ vs. ‘runs commands’).
Though further research is necessary — for example, this pilot study does not resolve the correlation/causation problem — these preliminary results suggest that the language one uses to describe virtual assistants may impact how you think about these assistants (and vice versa). In turn, such perceptions may influence your actual behavior.
As you’ve likely guessed, these studies are just the tip of the iceberg, particularly when it comes to questions about how your interactions with virtual assistants have the potential to impact your interactions with other human beings.
If you spend the morning barking rude commands like ‘make a list’ and ‘shut up’ (an actual Alexa command) with your assistant, will you default to this rudeness when you’re in a team meeting later today?
Will your child benefit from your in-home virtual assistant in terms of language and cognitive development, as it offers a no-risk opportunity to communicate?
Will the fact that most assistants are set to a female voice have an impact on your perception or treatment of women?
Like Alexa, will you start responding, “I’m sorry, I don’t know that,” to most questions thrown your way?
With the exception of that last one, the truth is, we don’t really know. After all, this is relatively new territory. Though researchers are currently working to answer such questions, the technical capabilities of computers are progressing more quickly than our ability to examine and understand the human impact.
It’s entirely possible that interacting with virtual assistants could shape everything from your treatment of other human beings to your definition of what makes something ‘alive’. It could help mold your personality and even affect basic, unconscious decisions like the words you choose. Some of these things will be positive. Some may be negative. Regardless, they need to be observed and understood. That’s how we balance the human-computer scale.
A new voice
The future of VUIs is still not clear. Virtual assistants may be a novel fad used by some and ignored by others, or they may represent the beginnings of a complete transition away from keyboards and touchscreens. Given the proliferation of new smart appliances, smart houses, and smart cars, though, it’s a safe bet that they’ll be tough to ignore in the years to come.
This proliferation has huge, largely unexplored repercussions in terms of human cognition, language, and interaction with machines.
After all, language has developed and evolved over hundreds of thousands of years of human struggle and survival. During that time, we’ve really only conversed with humans.
What’s going to happen when there’s a new voice in the conversation?
To show your support for this post and recommend it to your followers, click on the clap icon 👏 below. Each users is allowed to clap up to 50 times to show how much they appreciated a story.
The University of Basel has an international reputation of outstanding achievements in research and teaching. Founded in 1460, the University of Basel is the oldest university in Switzerland and has a history of success going back over 550 years. Learn more