Illustration by Elina Frolova

Prototalking — Five rules for designing well-functioning voice interfaces

There is no doubt that one of the most promising tech wonders of tomorrow are the digital voice assistants offering speech-driven support, information and entertainment through smartphones and smart speakers.

Designit
Matters
Published in
6 min readMar 13, 2019

--

By Stinne Jensen, Anders Tolstrup Rasmussen, and Jens Christiansen

Today, it is estimated that over 1 billion devices now provide access to voice assistants like Google Assistant, Apple’s Siri, Amazon Alexa and Microsoft Cortana. Studies show that almost half of all smart speaker owners use voice assistants daily and, adding to this, over 76 percent of them increased their usage of voice assistants in the last year. This stuff is going places!

It’s fair to say that the technological excitement is real and the possibilities of adding voice interfaces to existing services, products and brands seems endless. A quick run-through of voice services today covers everything from healthcare and banking to games and home equipment management and even as everyday aid for those with disabilities. In short, VA services are to be found wherever handsfree interaction can make your life just a little more frictionless.

Despite the thrill and many possibilities of the technology, the actual services provided by the current voice assistants prove to lack quality. Dialogues are rigid, the flows are often abrupt, or dead ended, and way too often you’ll find yourself struggling to get (meaningful) answers from the assistant.

Studies show that voice assistants are mostly used as alarm clocks or timers for cooking, for weather rapports and controlling the living room light or speaker volume. And for the daily dose of dad jokes, not to forget. All in all, fairly convenient services — but also a fair bit far from being a personal assistant capable of adapting to human interaction in a reliable way, which is often the way they are marketed. And this is where the trouble starts. If we expect our machines to provide us with human-like responses that flawlessly and intuitively offers us answers to our questions and they fail at doing just that, we are turned off and opt out. Simply because it is a bad user experience.

In our perception, the main challenges for the technology as of today are the quality of both flow and responses given by the services. It is still far from a frictionless conversation and the answers are not always relevant or even remotely on point. The exchange of words is mostly one-sided and might more accurately be characterized as a monologue by command.

Another challenge is the human end of the “conversation”. Have you ever found yourself trying to get your voice assistant to understand your question or command by simplifying your sentences and instructions so much they make YOU sound like a machine?

So, when a global food business asked us to create a voice assistant service concept we were challenged to rethink how to create a great and meaningful user experience within the limits of the current technology.

Out of this work grew five distinctive learnings targeting the conversational challenges, the unstable flows and general shortcomings of the technology as of today.

Clear call to action

To accommodate most users’ anticipated “when will this go wrong” assumptions, you need to install intuitive navigation. We are accustomed to navigating with buttons on our devices and visual interfaces, but voice interfaces are by definition not visual, and navigation is done by voice alone. A way to overcome this is to embed simple and clear calls to action. Every answer given by the digital assistant is followed up by an option the user must respond to, giving the user in a sense of control and of being in charge of the flow. And when the flow ends, the service explicitly says so.

Short sequences

As mentioned before, creating good voice interfaces holds the challenge of imitating a conversation while not being one at all. One of the worst user experiences is when the assistant talks for too long for us to keep up with the information. Having to start over and listen to the same speech again just isn’t a human-centered service experience. So, be direct and let the service offer the conversational information in short sequences to keep the user’s attention, overview, and interest in the forthcoming steps of the user journey. This sounds like a fairly easy task, but like any other design, it requires constant fine-tuning and iterations.

Basic commands

Currently one of the major challenges of voice interfaces is the level of confusion, frustration and resignation experienced at the user end of the interaction. Misunderstood communication and uninterpreted commands cause response-failure and disrupted flows. Keeping the available commands limited and combining them with clear calls to action will safeguard the flow and secure an improved user experience.

Preventing errors

Most voice assistant services handle errors by not responding. Sometimes they’ll offer the polite but annoying: “I don’t understand the question”, but mostly they just don’t respond. We wanted to make it clear to the user when a command was not understood or fell outside the capacity of the intended service, and most importantly let the service lead the user back on track and into the designated flow. The importance of defining the framework and making it clear to the user what to expect from the service (and what not to expect) matters not only for the user experience, as it lowers the risk of disappointment, but it also matters for the design process. Clear constraints make it buildable.

Principle of personality

During testing of our voice assistant concept, another significant fact surfaced: the importance of installing personality in the digital assistant, both within the actual voice, but more importantly in the tone of voice. As the feedback taught us, installing just right amount of personality is central; it must not be too distinctive, but definitely not too weak. This challenge is definitely a copy job, and although you will never hit the right note with everyone, making sure your assistant doesn’t sound like a stiff recitation will get you far.

There is no doubt that the potential within this technological platform of voice assistants is massive. However, there is a strong need for rethinking the importance of the exact usage situation and confine to design a relevant service for just that, keeping the service human centered, not tech centered.

Until the technology improves to accommodate the emerging needs related to voice assistant services, it’s practical to use the strengths of other platforms like regular websites and app configurations when building service concepts for voice assistants.

--

--

Designit
Matters

Designit is a global strategic design firm, part of the leading technology company, Wipro.