Member preview

Pretty Please: Politeness in Voice User Interfaces

Customers of today’s voice assistants continue to ask for a “politeness mode”. But what are we really asking?

During my time at Amazon on the Alexa voice design team, we monitored the internal Alexa enthusiasts alias. About once a month, a newcomer to the list would resurrect an evergreen request: “My kids are rude to Alexa. She should only respond if you say please!”

That request, of course, isn’t specific just to Amazon employees. In fact, Chaim Gartenberg at The Verge mused about this topic while I was in the midst of writing this piece:

Demanding that my phone turn on and off the lights started feeling weird to say aloud, which got me to wondering: was I being rude to my smartphone?

And we as humans should question the effect our device interactions have on our own humanity, most of all these digital assistants who leverage our socially wired brains for better effect.

But what are we really asking for when we seek a formal framework for politeness — like requiring “please” and “thank you” — in our voice user interfaces? If deployed coarsely, this seemingly helpful mechanic could unintentionally backfire. Better to take a moment and deconstruct the request. What’s the best way to model politeness in VUIs?

The Importance of Mirror Neurons

This request for a “politeness mode” is almost universally made in the context of children using voice assistants. But would simply requiring “polite” keywords from children have the desired effect?

Mirror neurons are a relatively recent addition to our understanding of cognitive psychology, but they bear particular importance here. From a Scientific American interview with mirror neuron pioneer Marco Iacoboni:

What makes these cells so interesting is that they are activated both when we perform a certain action — such as smiling or reaching for a cup — and when we observe someone else performing that same action. In other words, they collapse the distinction between seeing and doing. In recent years, Iacoboni has shown that mirror neurons may be an important element of social cognition…

Consider a child interacting with a voice assistant like Alexa. That child is already deprived of the ability to watch their conversational partner (the device) exhibit physical signs of “politeness”, and it’s unlikely mirror neurons will fire just by listening to the device speak.

However, watching a parent interact politely with Alexa would be far more likely to trigger those mirror neurons which ease social cognition and learning. Perhaps it isn’t so much that the device needs to require the child to speak politely. Setting a politeness mode universally on a device would force parents to model this social contract for their children, providing a new sort of in-home finishing school.

But this requires extra effort of parents — and I doubt this is actually what most parents requesting this feature have in mind. Requiring “please” at every turn would likely rapidly lead to frustration, possibly backfiring.

In fact, if the required politeness is treated by adults as frustrating over time, we might accidentally teach kids that politeness is frustrating; a social tax. In order to develop a humane system that will work for both parents and children, then, we need to also better understand what the social contract of “politeness” really entails in the context of these voice assistants.

In order to develop a humane system that will work for both parents and children, then, we need to also better understand what the social contract of “politeness” really entails in the context of these voice assistants.
Please: More than just a word. (image license: Adobe Stock)

The Magic Word

The manifestations of “politeness” vary wildly from country to country, so a “politeness mode” is already a slippery concept at best. Furthermore, politeness will be very costly to localize since it may require a completely different system design from region to region. We must never assume that politeness in one country applies to another region’s norms.

In the United States, many children are taught that “please” is the magic word, a word that unlocks action on behalf of “polite” adults. Does this mean that a “politeness mode” should require “please” at every turn?

Ironically, most voice assistants are intentionally NOT marketed nor optimized to encourage childrens’ use, largely because of the (needful) legislation discouraging data collection from children. But let’s disregard that fact, for the sake of argument. How could we enforce the use of “please”?

Option A is to respond just as a parent would, perhaps a bit pithily.
“Alexa, set an alarm.”
“I didn’t hear the magic word…”

Imagine getting this response at 11PM when you’re preparing for bed. How would you feel? Or, perhaps, as my friend uses Alexa, if you tried to set a time-out timer for your child in the heat of the moment. Would being corrected for lack of politeness when trying to discipline a child really help solve the problem?

Option B is to complete the action, but add on some reinforcement.
“Alexa, set an alarm for 7AM tomorrow.”
“Your alarm is set for 7AM tomorrow. By the way, it makes me happy when you say ‘please’.”

Less immediately annoying, but preachy. Can you honestly say this wouldn’t irritate you, and perhaps elicit a negative response in front of the kids we’re supposed to be teaching?

Option C is to swing towards positive reinforcement. 
“Alexa, please set an alarm for 7AM tomorrow.”
“Your alarm is set for 7AM tomorrow. Thanks for asking so politely!”

Today’s prompts would remain as is, but successful use of the word “please” in appropriate scenarios could result in a more pleasing exchange. Naturally, we’d want to vary the “pleasing” responses so they don’t get TOO repetitive, since we’re trying to encourage more frequent use of Please.

Option D is to go abstract, and mirror brusqueness of impolite speech.
“Alexa, set an alarm for tomorrow at 7AM.”
“Fine, your alarm is set.”

While I wouldn’t necessarily recommend this approach since it’s a bit of a user experience regression, the lack of politeness means the system is less forthcoming with information. If you want the full confirmation (“Your alarm is set for 7PM tomorrow”), you need to be polite about it.

So what’s the right answer? There’s no silver bullet; it probably depends not only on your assistant’s tone and demeanor, but your brand and the context of use. And of course, there are likely many other ways to attack this problem. But all four of these options run up against repetitiveness, especially if applied to all requests.

A Time and a Place

If you examine your conversations over time, the fact is that you’re not likely to say Please every time you make a request. “Please” is more likely in specific situations. For example:

  • The first request in a conversation
  • Requests made in public
  • Requests that require particular effort on behalf of the recipient
  • Shorter requests — for longer asks, the wording itself may be enough to convey politeness. “May I have a cookie?”

And in other situations, we tend to forgive our peers when they forget the “magic word”. In stressful situations, or when a task is time-sensitive, we don’t expect politeness. If your oven’s on fire, “Please put the fire out!” is perhaps an undue level of formality for the moment.

This bears interesting implications for assistants reaching into work environments or high cognitive load environments like cars (or even kitchens). To prevent undue frustration, our “politeness mode” should take into consideration our state and environment — after all, that’s the human thing to do.

She’s stressed, her eyes and mind are on the road — does this driver really benefit from a “politeness mode” in this context? If not, when would we once again look for that social behavior? (image license: Adobe Stock)

Further, requiring please means being flexible enough to accept it anywhere it appears in a sentence — if I politely ask “Could you please play Madonna?” it’d be rude to reply “I didn’t hear the magic word” just because ‘please’ appeared in the middle of an utterance. To avoid unduly taxing customers, our voice interfaces need to be prepared for arbitrary placement of these keywords.

Giving Thanks

But politeness isn’t all just a single word. There are other ways to express onesself politely (using titles and honorifics, etc.). And, of course, there is the concept of thanks. When a job’s well done or appreciated, thanking our conversational partner for their efforts.

Today’s voice interfaces tend to be wake word activated, which makes thanks somewhat awkward. To thank Alexa, one must formally preface it with the wake word: “Alexa, thank you.” Just blurting out thanks won’t activate the microphone, and the device will never “hear” it.

The alternative — having your smart speaker listen after every interaction for “thank you”- is a privacy risk. I know Alexa devices, for one, have a strict approach to privacy and will not turn the mic on after a request is done without hearing the wake word. In the end, I think the loss of the more natural “Thank you” is worth the greater privacy overall.

That awkwardness aside, what incentive do we have to give thanks? Why do we do it? In our person to person social contract, the concept of “thanks” is used to provide our conversational partner with a sense of being appreciated: happiness, satisfaction, or even pride. Over time, we build trust in a relationship by showing we see and appreciate the effort expended. And in return, sometimes we are rewarded by seeing our conversational partner’s mood improve.

Thank you: a universal concept. (image license: Adobe Stock)

So what is the mood equivalent for a voice assistant? Could receiving frequent “thanks” make an assistant happier? More familiar with us? Perhaps frequent “thanks” lessens the importance of “please”, since the net respect shown is the same? In the human social contract, people who feel thanked and appreciated often go above and beyond. Could that manifest in our devices?

Gamification, or lack thereof

One idea I’ve intentionally stayed away from is gamifying these polite interactions. Of course, we COULD track a customer’s “politeness” level, awarding points for please/thank you and removing them for frequent omissions. But when we apply that level of abstraction, we dissociate the politeness from the human benefit. Instead of teaching kids (and/or adults) how politeness improves our interactions, we’re simply providing a framework for interacting with digital devices, one which can certainly be manipulated.

In the end, any system that uses politeness as a gate for core functionality may backfire. Reinforcing good behavior also means matching the motivation you’d like to see in the real world. As The Verge put it:

I’m polite to my smart assistants because I want to be polite to people too, and reinforcing rude habits seems like a bad idea.

On the other hand, gamification is often about providing timely, satisfying feedback to reinforce specific behaviors. If we can find a way to make that feedback non-quantitative — that is, if we find a way to make our VUIs respond positively to polite behavior without gamification crutches like points or badges — I do believe we’d stand a better chance of having a genuine and lasting positive impact on social behaviors.

Towards more humane voice experiences

Clearly, we exit this discussion with more questions than answers. I’d love to see more exploration in this space. Perhaps academic research around the lasting effects of such “politeness” constructs on longitudinal (ie, long-term) interaction with digital assistants. Well-constructed studies on the potential benefits to kids of having politeness-enabled assistants in the home.

It’s important for us to continue to ask more of our technology, especially the technology that leverages natural user interfaces. Our devices should continue to adapt to our humanity, and not the other way around.

But like many challenges, politeness isn’t as simple as a universal “please” and “thank you.” It’s a complex systems design challenge for the right team to explore a humane new world, where our digital systems reinforce and extend our best habits in constructive ways. Are you up for the challenge?

Oh… and thank you for reading.

Cheryl Platz has worked on a variety of voice user interfaces including the Echo Look and Echo Show, Amazon’s Alexa platform, Windows Automotive, and Cortana. She is currently Design Lead for the C+E Admin Experience team at Microsoft. As founder of design education company Ideaplatz, Cheryl is also touring worldwide with her acclaimed natural user interface talks and workshops. Her next speaking appearance will be at Interaction ’18 in Lyon this February.
One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.

Only members of Medium may see responses to this story.