The frontline in the battle to implement AI throughout society is the interface between us and them, the human-machine interface. Increasingly, our hands are being freed to do other things as we are given interaction options that don’t require touch. Alongside physical gestures, facial expressions and other forms of communication, the key hands-free connection between us and AI is our voice.
When Siri was released on the iPhone 4S in 2011, it straddled that dangerous borderline between ephemeral novelty and useful companion. It was hardly Samantha from Her, or Jarvis from Iron Man. At this time, the current generation of voice-controlled assistants are still teetering in that boundary zone. They have a long way to go before they ‘just work’ — when we forget we are talking to machines and trust them to know us well.
Nevertheless, the most popular item bought on Amazon in the 2017 holiday season was Amazon’s own Echo Dot smart-speaker. Massive consumer demand for AI in the home is evident.
At our company, Sensum, we recently ran a study to compare the emotional connection people have with some of the top brands of what are technically known as IPAs. No, not craft beer: intelligent personal assistants; AKA automated personal assistants; AKA virtual assistants; AKA voice-user interfaces… I don’t think there is one common term for the technology yet.
In the study, we examined user’s emotional response to the IPAs, and explored the role of empathy in their interactions with the products. Here are some insights and thoughts from what we learned.
The study covered Alexa, Siri and Cortana. In other words, Amazon, Apple and Microsoft. Notably, we didn’t include Google’s assistant this time, despite being one of the other top players in the market, but we would include it in future research. Google is also distinct in not having a unique name for the ‘persona’ of its IPA. Google’s assistant acts as a gateway to Google as a whole, rather than attempting to anthropomorphise a separate ‘assistant’ entity such as Siri or Alexa. But I digress.
With participants hooked up to biometric sensors, we could see their near-instantaneous physiological responses to their interactions with the IPAs, and from that infer their emotional responses. The metrics for this study were heart rate and GSR (galvanic skin response, measuring the conductance of skin) which give us the participants’ arousal (excitement/relaxation) and engagement (with the stimulus: in this case the virtual assistant). These are favourite metrics of ours in many scenarios because they can indicate near-instant, and typically nonconscious, emotional responses.
The first intriguing finding was that participants showed high levels of arousal when first using the devices — holding a wake-button like the iPhone home button, or hearing a wake-noise like the assistant saying, ‘hi there’. Our first encounter with a novel form of interaction is a key moment for optimising the design of the user experience, and data from our bodies was shown to reflect this.
We saw elevated heart rate when the virtual assistants didn’t give the participants what they wanted. Particularly, participants got frustrated when the assistants spouted long lists of factual information in response to a simple question, or failed to work due to some issue such as requiring another app to be closed first. These are unsurprising results but it is good to see them backed by nonconscious biometric data.
Participants quickly disengaged when their IPAs didn’t understand them. This typically resulted in the assistant taking a stab at a misheard instruction. Regional accents are a particular concern in this area. We are headquartered in Northern Ireland so we really feel the pain on this one. Contending with comprehension issues such as an unusual accent, these devices quickly become the entertainment in the room, rather than the help.
In our study, Amazon’s Alexa appeared to generate the greatest emotional connection with participants. Heart rate and GSR levels increased when Alexa offered recommendations and told jokes. When Alexa said there was a 50% chance of rain that day, participants paid attention. When ‘she’ said, ‘what do you call a pig doing karate? …A pork chop’, the punchline elicited genuine laughter and corresponding spikes in the participants’ biometric signals.
We are at an early stage for AI to be truly humorous; to be very human-like. But the value of training artificial systems to understand us better and interact with us in more familiar, intimate ways could be vast. …as long as we do it ethically.
‘The goal is to create automated and targeted mind-reading technologies that tailor communications to a given individual. Technologies like these could be used to generate more engaging interactions, but also material and techniques that could most certainly be abused’. From Laughter and Humour as Conversational Mind-Reading Displays — Dr Gary McKeown, Queen’s University Belfast.
Minding our Ps and Qs
We don’t give virtual assistants the same patience that we afford to our human peers. We expect them to get the job done efficiently and correctly, and we get annoyed when they falter. This is perhaps unfair in the current state of AI’s evolution, but don’t expect users to sympathise. In fact, their interactions can quickly become uncivilised.
To explore the concept of manners between humans and machines, we asked participants to curse and insult the IPAs. They typically felt uncomfortable doing so, which suggests a level of empathy from the humans towards their assistants. But it was the AI response that interested us most. Few assistants retaliated to the abuse. There were some exceptions, such as Siri presenting a cue card with an exclamation mark on it when a curse-word was used.
I admit, when I heard about this issue of manners I thought, ‘who cares? It’s just unfeeling software’. It can be cathartic, even fun, to take out our annoyance on inanimate objects. I am constantly swearing at my laptop and mobile, while remaining very polite with people. But there is a latent moral dilemma in this new form of social etiquette.
For a start, we should consider if being mean to machines will encourage us to become meaner people in general. But whether or not treating Alexa like a disobedient slave will cause us to become bad neighbours, there’s a stickier aspect to this problem. What happens when AI is blended with ourselves? With the adoption of tools such as intelligent prosthetics, the line between human and machine is increasingly blurry. We may have to consider the social consequences of every interaction, between both natural and artificial entities, because it might soon be difficult or unethical to tell the difference.
Our rule-of-thumb is to apply the principle of ‘don’t be a dick’ to our interactions with AI, just as we do with humans.
Looking Forward to Empathic AI
At Sensum, our passion and expertise is focused on understanding human emotion, and teaching machines to be more empathic by sensing and responding to our feelings and behaviour. At this time, the emotional intelligence of systems like Alexa and Siri don’t even compete with the family dog. But they’re getting better with every datum recorded and processed. A highly empathic AI assistant might recognise the difference between its user abusing it for fun, and venting genuine frustration. It should then know whether to respond with sarcasm or sympathy.
Understanding the relationship and trust we have with IPAs, and being aware that these lie on a spectrum from highly invested to distantly involved, could influence the way this kind of AI is integrated into emerging technologies such as autonomous and semi-autonomous vehicles. For one thing, we could learn from mistakes made in the low-risk environment of our living rooms, with tools like Alexa, before implementing consistent AI into higher risk interactions such as driving.
In order for us to benefit from the combined learning of all our human-machine interactions, providers like Amazon, Apple, Microsoft, Google, Sony, Audi, Volkswagen and so on, would need to share information through universal tools or data architecture. This feels like a utopian (albeit potentially terrifying) dream, but it could offer great advantages. We might leave our empathic house and climb into an empathic car, then arrive at our empathic office, all with systems installed that are capable of the same level of personalised interaction with us.
With a personalised, smart and empathic virtual assistant that exists in a continuum across all devices, we could enter a future in which waking up on the wrong side of bed means your house, car and phone can all take steps to help you feel the way you want to. The journey from A to B would no longer just be between physical destinations but also between moods — such as from irritable to overjoyed — before your boss even notices.
Two academic papers that informed this story came from our collaborator, Dr Gary McKeown from the School of Psychology across the road from us at Queen’s University Belfast:
- Laughter and Humour as Conversational Mind-Reading Displays.
- Turing’s Menagerie: Talking Lions, Virtual Bats, Electric Sheep and Analogical Peacocks.
And here are some relevant stories on this topic:
- Pretty Please: Politeness in Voice User Interfaces (by Cheryl Platz on Medium).
- Should you say ‘please’ and ‘thank you’ to your Amazon Echo or Google Home? (by Chaim Gartenberg on The Verge).
- Alexa, Where Art Thou? (by M.G. Siegler on Medium).