Voice User Interface Design: New Solutions to Old Problems

In the past few years, voice user experiences have reached critical mass. Cortana. Alexa. Google.

Like many technologies that seem fresh off the presses (virtual reality, anyone?), voice user interfaces have been in the public consciousness for decades and in research circles even longer. Bell Laboratories debuted their “Audrey” system (the first voice controlled UI) in 1952, predating even Star Trek’s aspirational voice controlled computer!

Voice recognition systems have been a reality for more than half a century. (Photo: AndroidAuthority)

But speech scientists have long known the magic of transforming analog signals into digital meaning would take a scope of processing power that far outstripped its early humble roots. It is only recently, in the era of ubiquitous cloud computing, that consumers have access to enough processing power that their own voices can be heard and interpreted in real time.

A New Frontier

As user experience designers, we were most likely trained in crafting experiences designed for graphical output and physical input. I know that voice interfaces were far from the imagination of the academics of my time — during my senior projects, we were enamored of the Palm Pilot, and of handwriting input that foreshadowed today’s touchscreen UIs.

And yet, just as we adapted the skills we’d learned for the brave new world of input beyond the mouse and keyboard, so it is time for some of the designers of today to expand our skill sets to include voice input and the resulting output layers.

Touch and pen input, as seen in the Palm Pilot’s Graffiti input language, was once a quirky backwater of design exploration. Voice user interfaces have emerged from this phase.

In the last few years, a small but growing number of user experience designers have become full-fledged voice user interface (VUI) designers. Though it may seem a quirky specialty skill, so was mobile design 10 years ago. Voice user interface design will soon become a key strategic skill for a new generation of designers.

Our Oldest Interface

Humans have been developing the art of conversation for thousands of years. It is a skill adults draw upon instinctively, every day, for most of their lives.

Speech is one of the first skills we acquire in childhood — and one of the last we lose in our sunset years, long after our vision and motor skills begin to fade.

The deeply instinctive nature of speech presents specific constraints and new challenges. Our brains are fundamentally wired to interpret the source of speech as human. With few exceptions, we also expect a spoken response when we speak to someone. Thus, a device that speaks to us is tapping into a deep river of psychological adaptations, and subject to a set of assumptions a pixel-based UI will never encounter.

This is also why — at least for the moment — designing for voice user experiences is inherently different from conversational user interfaces, which at the moment are synonymous with text-based chat bots. Our thousands of years of speech-based perception and psychology don’t (yet) interfere with our ability to enjoy written conversations.

Today’s Voice UX: Command and Control

But let’s be super clear: the voice user experiences consumers are learning to use today are usually FAR from conversational. We are still in early days.

Though some players use “voice UI” and “conversational UI” interchangeably, in my observation there are no truly conversational spoken user interfaces yet. It’s still a bit more accurate to simply call Alexa, Google Home, and Cortana “natural language” voice control systems, but the distinction currently rests in the types of tasks we ask our voice-based assistants to complete. In fact, the key is the word “task”. These devices are all specialized for allowing customers to complete TASKS using their voice.

By way of example, the “natural language” way to turn off a light isn’t deeply conversational. You wouldn’t turn to your spouse and say, “Isn’t it a chilly night? I’m feeling a bit cold. Turn the thermostat up, won’t you?” (Unless you’re in an Oscar Wilde play, perhaps.) You’d probably just blurt out “Turn the thermostat down.” Less of a conversation, more of a request.

Furthermore, the way you complete simple tasks is almost always the same, regardless of emotion, mood, or context. Perhaps you might add “please” if you’re having a good day…

That doesn’t mean there isn’t quite a lot of complexity in getting this voice UI right — but as opposed to truly conversational UI, which paints in adjectives and nuance, command-and control voice UI deals in simplicity and robustness.

At present, voice user interface designers often spend a significant amount of design time focusing on how to help customers along when things go wrong. What happens if someone just says “Set an alarm” without specifying a time? Or if the system misheard “AM” instead of “PM”? By understanding how a voice interface can fail, VUI designers can find ways to turn those failures into eventual successes.

Adapting Your Design Instincts

My time working on VUI for Windows Automotive, Cortana, and Alexa gave me an appreciation for the differences in the design process between visual and voice-based UX, and a passion for sharing that knowledge as it was shared with me by some esteemed coworkers along the way (thank you Lisa Stifelman, Sumedha Kshirsagar, and Stefanie Tomko, amongst others).

As a result of that passion, I was honored to debut my workshop Giving Voice to Your Voice Designs at Interaction 17, a global design conference sponsored by the Interaction Design Association (IxDA).

In my #Ixd17 workshop, we started with a primer on key terms and concepts that relate to the speech science component of voice UI: how an analog voice “utterance” is converted into a digital system’s representation of a customer’s “intent”. Usually, this interpretation process spans multiple disparate but connected systems, which is why cloud computing smashed VUI doors wide open.

We explored common situational constraints and some simple guidelines to set them up for success in the final phase of the class, where we walked through an end-to-end design process with design deliverables for a third-party voice skill.

Walking workshop participants through the process of building an interaction flow for a 3rd party voice feature at #IxD17 — ironically, in a studio in NYC’s School of Visual Arts. Photo credit Malika Chatlapalli.

My participants really impressed me with their thoughtful questions that drove at some much deeper challenges facing voice UIs, like contextual awareness and “memory” over time. (A later article will deal with a few of these concepts.) These practitioners are a clear indicator that many of today’s designers can transfer their existing design skills to voice with some simple reframing and a bit of added subject matter expertise.

Voice Input Changes Lives

Even though current voice UIs are a bit more simplistic than the dreamers amongst us would like to see, we can’t lose sight of the very real benefits voice experiences provide, even simplistic, when done correctly.

The biggest and most impactful benefit voice user experiences provide is vastly improved accessibility. Looking for inspiration? Go read the reviews of the Amazon Echo. There are so many stories from mobility-impaired customers, vision-impaired customers, and customers with cognitive impairments about how the device has changed their life at home.

That’s the real quantum leap here. Voice user interfaces don’t solve any NEW problems… yet. But they solve existing problems in novel ways that significantly improve life for many individuals.

Setting alarms, getting answers to informational questions easily found on Wikipedia… yes, we could do these things before on our smartphones and our computers. But we had to turn our attention to a device to do so. And in that moment, we exchange a bit of our humanity, temporarily, for that exchange of service.

Voice UIs allow us to remain fully human in our interactions. They allow us to remain more connected to the other humans in the room. And these VUIs are life-changing for those who can’t easily adapt themselves for traditional computer use.

So the need for voice user experiences — even today’s crop of control-focused, less conversational UIs — is real, and these experiences change lives. You might not be replacing your existing experience, but even adding voice UI to extend an existing experience can have a major impact on your customers.

Find Your Own Voice

Inspired? I hope so. I challenge every designer to start looking at voice input as an important new way of connecting with customers. Are there unseen opportunities that could transform the way customers use your product? Even better, transform their lives?

And even if you’re a “traditional” designer, don’t be immediately intimidated. Many practitioners started just as you did, in a traditional visually-oriented world. Designers are inherently curious and mentally resilient. You can reframe your thinking with some new knowledge and a few adapted skills.

But there’s so much more to the world of voice user experiences. In my next post, we’ll talk about conversational user interfaces, a hot topic that surfaced repeatedly at Interaction 17. And we’ll talk about the how the voice user interfaces and text-based conversational user interfaces of today may soon begin to intersect.

May the voice be with you.

After several years focusing her design efforts on NUI and VUI, Cheryl is currently Design Lead for the Azure Portal + Marketplaces at Microsoft. Find out more about her diverse background and portfolio at her blog or on LinkedIn. You can also follow her on Twitter.