3 examples of context-aware voice design

Brielle Nickoloff
Women in Voice
Published in
4 min readJan 24, 2021
Photo by Jonas Leupe on Unsplash

This morning around 8:45, I told Google Assistant [word-for-word], “Set an alarm for 9”. It responded,

“Alarm for 9 A.M. — Set.”

Fifteen minutes later, the jingle played and the time showed up on the screen. I said, “hey Google, stop.”

It stopped. But then it replied,

“By the way, you can also just say ‘stop’ without having to start with ‘hey’, followed by ‘Google’.”

Because I’m a voice nerd, I wanted to try to get Google Assistant to say this tooltip again.

So I set another timer and followed the same method, cutting it off again by saying “hey Google, stop.”

But this time, even though I still said the extraneous words, the jingle just stopped playing. No reminder about me not needing to say “hey Google” first.

Because this device had previously been set up to recognize my family members’ voices and faces, I wanted to try the experiment with my partner. I asked him to follow the same technique — ask Google to set an alarm, then cut off the alarm jingle with “hey Google, stop.”

Bingo! As expected, it dropped the same hint: it stopped the jingle and then told him, “By the way, you can also just say…”

This entire interaction reveals a few decisions that voice designers from Google made about how they wanted this experience to feel for users.

#1 Even though I didn’t specify AM or PM, Google Assistant assumed that I meant 9 AM

This feature might be hard coded (something like IF time of day = morning THEN assume user means alarm time should also be morning). And/or the feature might rely on statistics to make this assumption (e.g. in the past, 99% of users at 8:45am who set an alarm for 9am — without specifying AM or PM — meant that they wanted it at 9am, not 9pm.)

#2 Since Google Assistant made this assumption, it *implicitly* asked me to confirm it

Notice how Google Assistant didn’t explicitly confirm by asking, “Did you want that alarm to be set for 9am or 9pm?”

Google Assistant implicitly confirmed that I meant 9am, when it added the “am” detail into the response back to me.

By implicitly confirming this, Google Assistant saved me time (since I now don’t need to answer another whole question) but also gave me a chance to correct it in case I had actually meant 9 *pm*.

#3: Google Assistant adjusted its answer using data about who was talking to it

The most obvious context-aware part of this interaction was that Google Assistant didn’t bother me a second time about the tooltip. It recognized my voice, knew that it had already told me, and didn’t repeat itself. But it also knew that it hadn’t yet told my partner about this tip, so it still got triggered when he asked.

The designers made a bet that any user who doesn’t end up using the shortened command (either because they forgot to use it, or because they didn’t want to use it) probably doesn’t want to hear the tooltip again, at least when so little time has elapsed since the user first heard it.

Now, the question is, does this tooltip reset after a set period of time? If I say “Hey Google, stop” in a week, will Google Assistant remind me of the shortened command then? Or, maybe it resets after 24 hours?

If it did, that would introduce a fourth layer of context that the designers implemented into this feature: how much time should elapse until the tooltip can be deployed again for any given user.

Or, maybe this tooltip never resets. Maybe it’s a one-shot deal!

Who’s up to experiment and find out? ;)

Since you’re still reading, you might be interested in even more context considerations for this scenario!

If you were a voice designer building out a “set a timer” or “set an alarm” experience…

  1. What times of the day would you deem “safe” to make an assumption about AM/PM? For example, if a user at 9:45 pm said, “set a timer for 10,” would you want Google Assistant to assume they meant 10 pm, or 10 am the next day?
  2. Do you think this assumption at 9 pm should change if the user said “set an alarm” instead of “set a timer”? (=would a user be more likely to mean 10pm if they said ‘timer’ and more likely to mean 10am if they say ‘alarm’?)
  3. During times of the day when it’s less certain whether a user means AM or PM, would you add in an explicit confirmation? (“Great. Setting a timer. Did you want that for 10 pm or for 10 am?”) Would you add in even more clarifying words? (“Great. Setting a timer. Did you want that for 10 pm tonight or for 10 am tomorrow morning?”) Would you instead phrase it as a yes/no question, so the user has to use less energy to respond? (Great. Did you want that timer for 10 pm tonight?”)
  4. If you went the route of an explicit confirmation, how would you order the two times? Would you say “Did you want that for 10 pm or for 10 am?” since 10 pm is nearer to the current time? Or would you say “Did you want that for 10 am or for 10 pm?” since that’s the natural order of the two times?

All of these are things Google’s VUI designers may have thought through when designing this experience. For any Googlers out there, are there any we’re missing? :)

--

--

Women in Voice
Women in Voice

Published in Women in Voice

International nonprofit for women and gender minorities in voice tech and conversational AI. Find more about our mission, our team and initiatives at womeninvoice.org

Brielle Nickoloff
Brielle Nickoloff

Written by Brielle Nickoloff

Cofounder & Head of Product @Botmock