Listen to this story
During my time at Amazon on the Alexa voice design team, we monitored the internal Alexa enthusiasts alias. About once a month, a newcomer to the list would resurrect an evergreen request: “My kids are rude to Alexa. She should only respond if you say please!”
That request, of course, isn’t specific to Amazon employees. In fact, Chaim Gartenberg at The Verge mused about this topic while I was in the midst of writing this piece:
Demanding that my phone turn on and off the lights started feeling weird to say aloud, which got me to wondering: was I being rude to my smartphone?
And we as humans should question the effect our device interactions have on our own humanity — most of all that of these digital assistants who leverage our socially wired brains for better effect.
But what are we really asking for when we seek a formal framework for politeness — like requiring “please” and “thank you” — in our voice user interfaces? If deployed coarsely, this seemingly helpful mechanic could unintentionally backfire. Better to take a moment and deconstruct the request. What’s the best way to model politeness in VUIs?
The Importance of Mirror Neurons
These requests for a “politeness mode” are almost universally made in the context of children using voice assistants. But would simply requiring “polite” keywords from children have the desired effect?
Mirror neurons are a relatively recent addition to our understanding of cognitive psychology, but they bear particular importance here. From a Scientific American interview with mirror neuron pioneer Marco Iacoboni:
What makes these cells so interesting is that they are activated both when we perform a certain action — such as smiling or reaching for a cup — and when we observe someone else performing that same action. In other words, they collapse the distinction between seeing and doing. In recent years, Iacoboni has shown that mirror neurons may be an important element of social cognition…
Consider a child interacting with a voice assistant like Alexa. That child is already deprived of the ability to watch their conversational partner (the device) exhibit physical signs of “politeness,” and it’s unlikely mirror neurons will fire just by listening to the device speak.
However, watching a parent interact politely with Alexa would be far more likely to trigger those mirror neurons which ease social cognition and learning. Perhaps it isn’t so much that the device needs to require the child to speak politely. Setting a universal politeness mode on a device would force parents to model this social contract for their children, providing a new sort of in-home finishing school.
But this requires extra effort for parents — and I doubt this is actually what most parents requesting this feature have in mind. Requiring “please” at every turn would likely rapidly lead to frustration, possibly backfiring.
In fact, if the required politeness is treated by adults as frustrating over time, we might accidentally teach kids that politeness is frustrating; a social tax. In order to develop a humane system that will work for both parents and children, then, we need to better understand what the social contract of “politeness” really entails in the context of these voice assistants.
In order to develop a humane system that will work for both parents and children, then, we need to better understand what the social contract of “politeness” really entails in the context of these voice assistants.
The Magic Word
Manifestations of “politeness” vary wildly from country to country, so a “politeness mode” is already a slippery concept at best. Furthermore, politeness will be very costly to localize since it may require a completely different system design from region to region. We must never assume that politeness in one country applies to another region’s norms.
In the United States, many children are taught that “please” is the magic word, a word that unlocks action on behalf of “polite” adults. Does this mean that a “politeness mode” should require “please” at every turn?
Ironically, most voice assistants are intentionally not marketed nor optimized to encourage children’s use, largely because of the (needful) legislation discouraging data collection from children. But let’s disregard that fact, for the sake of argument. How could we enforce the use of “please”?
Option A: Respond just as a parent would, perhaps a bit pithily.
“Alexa, set an alarm.”
“I didn’t hear the magic word…”
Imagine getting this response at 11 p.m. when you’re preparing for bed. How would you feel? Or, perhaps, when trying to set a time-out timer for your child in the heat of the moment. Would being corrected for lack of politeness when trying to discipline a child really help solve the problem?
Option B: Complete the action, but add some reinforcement.
“Alexa, set an alarm for 7 a.m. tomorrow.”
“Your alarm is set for 7 a.m. tomorrow. By the way, it makes me happy when you say ‘please’.”
Less immediately annoying, but preachy. Can you honestly say this wouldn’t irritate you, and perhaps elicit a negative response in front of the kids we’re supposed to be teaching?
Option C: Swing towards positive reinforcement.
“Alexa, please set an alarm for 7AM tomorrow.”
“Your alarm is set for 7AM tomorrow. Thanks for asking so politely!”
Today’s prompts would remain as is, but successful use of the word “please” in appropriate scenarios could result in a more pleasing exchange. Naturally, we’d want to vary the “pleasing” responses so they don’t get too repetitive, since we’re trying to encourage more frequent use of Please.
Option D: Go abstract, and mirror the brusqueness of impolite speech.
“Alexa, set an alarm for tomorrow at 7 a.m.”
“Fine, your alarm is set.”
While I wouldn’t necessarily recommend this approach since it’s a bit of a user experience regression, the lack of politeness means the system is less forthcoming with information. If you want the full confirmation (“Your alarm is set for 7 p.m. tomorrow”), you need to be polite about it.
So what’s the right approach? There’s no silver bullet; it probably depends not only on your assistant’s tone and demeanor, but your brand and the context of use. And of course, there are likely many other ways to attack this problem. But all four of these options run up against repetitiveness, especially if applied to all requests.
A Time and a Place
If you examine your conversations over time, the fact is that you’re not likely to say Please every time you make a request. “Please” is more likely in specific situations. For example:
- The first request in a conversation
- Requests made in public
- Requests that require particular effort on behalf of the recipient
- Shorter requests. (For longer asks, the wording itself may be enough to convey politeness, e.g. “May I have a cookie?”)
And in other situations, we tend to forgive our peers when they forget the “magic word.” In stressful situations, or when a task is time-sensitive, we don’t expect politeness. If your oven’s on fire, “Please put the fire out!” is perhaps an undue level of formality for the moment.
This bears interesting implications for assistants reaching into work environments or high cognitive load environments like cars (or even kitchens). To prevent undue frustration, our “politeness mode” should take into consideration our state and environment — after all, that’s the human thing to do.
Further, requiring please means being flexible enough to accept it anywhere it appears in a sentence — if I politely ask “Could you please play Madonna?” it’d be rude to reply “I didn’t hear the magic word” just because ‘please’ appeared in the middle of an utterance. To avoid unduly taxing customers, our voice interfaces need to be prepared for arbitrary placement of these keywords.
But politeness isn’t just about a single word. There are other ways to express oneself politely (using titles and honorifics, etc.). And, of course, there is the concept of thanks. When a job’s well done or appreciated, we thank our conversational partner for their efforts.
Today’s voice interfaces tend to be wake word activated, which makes thanks somewhat awkward. To thank Alexa, one must formally preface it with the wake word: “Alexa, thank you.” Just blurting out thanks won’t activate the microphone, and the device will never “hear” it.
The alternative — having your smart speaker listen after every interaction for “thank you” — is a privacy risk. I know Alexa devices, for one, have a strict approach to privacy and will not turn on the mic after a request is complete without hearing the wake word. In the end, I think the loss of the more natural “Thank you” is worth our greater privacy overall.
Awkwardness aside, what incentive do we have to give thanks? Why do we do it? In our person to person social contract, the concept of “thanks” is used to provide our conversational partner with a sense of being appreciated: happiness, satisfaction, or even pride. Over time, we build trust in a relationship by showing we see and appreciate the effort expended. And in return, sometimes we are rewarded by seeing our conversational partner’s mood improve.
So what is the mood equivalent for a voice assistant? Could receiving frequent “thanks” make an assistant happier? More familiar with us? Perhaps frequent “thanks” lessens the importance of “please,” since the net respect shown is the same? In the human social contract, people who feel thanked and appreciated often go above and beyond. Could that manifest in our devices?
Gamification, or Lack Thereof
One idea I’ve intentionally stayed away from is gamifying these polite interactions. Of course, we could track a customer’s “politeness” level, awarding points for please/thank you and removing them for frequent omissions. But when we apply that level of abstraction, we dissociate the politeness from its human benefits. Instead of teaching kids (and/or adults) how politeness improves our interactions, we’re simply providing a framework for interacting with digital devices, one which can certainly be manipulated.
In the end, any system that uses politeness as a gate for core functionality may backfire. Reinforcing good behavior also means matching the motivation you’d like to see in the real world. As The Verge put it:
I’m polite to my smart assistants because I want to be polite to people too, and reinforcing rude habits seems like a bad idea.
On the other hand, gamification is often about providing timely, satisfying feedback to reinforce specific behaviors. If we can find a way to make that feedback non-quantitative — that is, if we find a way to make our VUIs respond positively to polite behavior without gamification crutches like points or badges — I do believe we’d stand a better chance of having a genuine and lasting positive impact on social behaviors.
Towards More Humane Voice Experiences
Clearly, we exit this discussion with more questions than answers. I’d love to see more exploration in this space. Perhaps academic research around the lasting effects of “politeness” constructs on longitudinal (i.e., long-term) interaction with digital assistants could help. Well-constructed studies on the potential benefits for kids of having politeness-enabled assistants in the home would be useful as well.
It’s important for us to continue to ask more of our technology, especially technology that leverages natural user interfaces. Our devices should continue to adapt to our humanity, not the other way around.
But like many challenges, politeness isn’t as simple as a universal “please” and “thank you.” It’s a complex systems design challenge for the right team to explore. We have an opportunity to design a humane new world, where our digital systems reinforce and extend our best habits in constructive ways. Are you up for the challenge?
Oh… and thank you for reading.