How can we make “shortcut” features discoverable in VUI?

GU Human Language Technology
5 min readSep 18, 2021

--

Written by Victoria Jin, Jungyoon Koh, Wai Ching Leung, & Ulie Xu

A young woman hurriedly getting dressed to go to work yells out, “Hey Google, how long ’til I’m late for work??”
Illustration by Victoria Jin

When you design a feature that’s meant to help cut down on unnecessary turn-taking in a voice interaction, how do you make sure that feature is itself discoverable? This was a conundrum we ran into when designing our first Google Assistant Action: DMV Metro, a skill that lets users retrieve information on metro times (for the DMV area) by voice.

A frazzled young professional has her hands full preparing to leave for work, but she wants to know when the next metro train will arrive. What does she need? A Google Assistant Action, of course! This was the vision that inspired our summer project. We wanted to create a short voice experience that could get this woman from point A to point B as quickly as possible — literally! In our idealized fantasy, the woman could simply ask Google Assistant when the next available trains arrived.

This simple invention of the imagination actually led us down quite the rabbit hole. Given our focus on providing the quick and clean — immediate, hands-free access to information — our project became an inadvertent venture into the question of how to balance immediacy with discoverability in a voice experience. In other words, how do we save time for the user? AND how do we tell them about all the cool things we designed our Action to do — without taking up too much of their time?

To equip users with the ability to access the desired information as quickly as possible, we would have to add certain features to our Action, such as remembering frequently used paths; however, with more features comes the problem of how to make sure that users are aware of the existence of those features, as noted in many discussions on VUI. The following article documents our thought process finding a balance between these two points.

Immediacy

Immediacy is quite the double-edged sword. It’s one of the crucial advantages that sets voice-controlled technology apart: we can speak much faster than we can type or tap a screen. Capitalizing on efficiency is a big priority for designers, and so we wanted our frequent users to be able to access information as quickly as possible. Though the information retrieval itself requires mere seconds, we knew that the bot’s interaction with the user would be more of a time-consuming process. (It’s a bit like speaking to your local all-knowing librarian that’s a bit hard of hearing — the initial process of conveying what you’re looking for is tricky, but once she understands you’re looking for comic books, it’s smooth sailing.)```

The use cases we mapped out supported our focus on immediacy. The mother of three running late and trying to reign in her children would be so relieved knowing she could simply call out to Google Assistant rather than fish out her phone (buried under dirty laundry) to look up arrival times. So too the elderly man with weak eyes would much rather chat with his Google Assistant than struggle to pull up the train schedule on his computer. And of course, the young professional from earlier who rushes to leave for work on time is always grateful for her speedy voice assistant.

Though the user personas are diverse, they all face the same enemy: time. So we deliberately included features that can shorten the experienced user’s journey (imagine having to say Franconia-Springfield every morning!). Instead of having the user always respond with the same — pretty darn long — station name when the bot asks which station they’re leaving from, the user can set their nearest station as “home.” The user can also save frequently used routes. For example, if a user wants to save their work route, the bot will remember which station they’re departing from (e.g. “home”) and which station they want to arrive at. This means that in the future, instead of the bot asking for two station names, it would simply recognize the word “work” as containing these two pieces of information (home and destination station).

Discoverability

Like we said, we didn’t want the user’s voice experience to take any longer than necessary — but this required significant sacrifices. For example, designing the bot to rattle off all of its features each time it was called upon meant we had to eat up precious seconds of the user’s time, forfeiting immediacy. If we bypass this feature, however, the user might never realize the Action’s full range of capabilities (and all the effort we spent building the Action :))

In a nutshell, voice-controlled technology is burdened with undesirable mystery — it is difficult for the user to discover all of a bot’s features and functions (aside from reading a manual, and no one does). With phones and laptops, the user is presented with a comprehensive visual display of all options, typically in the form of graphics or icons. However, in a conversational user interface (CUI), including chatbots and voice assistants, the user’s only access to information is what the bot articulates; with the medium of voice, in particular, there is no written record of what the bot has said — the user has to rely on their short-term memory (yikes!). So that’s an additional challenge for users to fully explore all the features of a particular Action.

Striving to keep the exchange short meant sacrificing some turns of talk. Our team wrestled with this relentless give-and-take throughout the design process. The solution that we came up with was to tailor the Action’s dialog to user inputs. For instance, we designed an informative prompt message that the bot would provide to novice users if they hesitated for longer than three seconds: “I can help you set your home station, save frequently used routes, or find information on specific routes. You can also say “help” for more information.” We also created follow up questions that would be prompted if a user repeatedly inputs the same route. With questions like “I noticed you travel to Dupont Circle quite often, would you like to set it as a frequent route?” our users will hopefully be able to discover these features exactly when they’re needed.

Initially, the dual problem of immediacy and discoverability was a difficult one to crack, but we realized that the real challenge is understanding the balance to strike between immediacy and discoverability. So once the nature of the problem became clear, we knew we could still design a voice experience that was both efficient and comprehensive. We definitely don’t have it all figured out yet, but we hope this has helped shed light on a perplexing issue. Keep an eye out for future articles detailing the rest of our design process as well as our finished product!

--

--