10 Best Practices When Designing for Voice

Follow these design practices when creating a custom Alexa skill to avoid leaving users frustrated with their voice experience

Oscar Merry
A Cloud Guru
7 min readMar 23, 2017

--

Most people that have used Alexa will recognize that experiences through voice are very different than on screens, which is why Voice User Experience design — or VUX design — requires a different set of skills.

At Opearlo we’ve been designing and building Alexa Skills for clients across a number of industries and use cases, and recently put together 10 Best Practices to keep in mind when designing for voice.

1. Manage Users Expectations

Movies like Her and Ex-Machina mean that people’s expectations of what digital assistants can do varies greatly.

Although voice technology has finally advanced to a level where it’s usable in every day life, we’re still in the equivalent of the Nokia 3310 days of the mobile phone revolution.

Setting your user’s expectations to what your voice app can deliver is critical in avoiding frustrated users.

You can do this in the skill description, in the Alexa Cards, and by creating a landing page for your skill showing a video of how to use it, like Capital One have for their Skill. Choosing a list of features for your app with a common theme can also play an important part in the design process.

For example, it doesn’t make sense for a travel voice app to have 90% of its features centered around holiday inspiration, and then 10% around FAQs. Chances are that as soon as a user has 1 FAQ answered, they’ll start to ask for loads more and be left severely disappointed when the others aren’t supported. Keep it around one theme to start with.

2. There Doesn’t Need to be a Hierarchy!

Screen based applications have a hierarchical GUI, which users can safely tap through, always starting from the home screen or the menu button.

But part of the delight of voice apps is that you don’t have to do this — they can be designed so that a user can reach any part of the experience on first launch.

This is what differentiates a great voice app from one that sounds like an IVR system, and is why you should spend a lot of time making sure each feature is designed and built in isolation.

3. Consider the Linguistics

The most delightful moments in voice occur when you ask Alexa something in a niche and personalized way, and she still gets it. Which is why a great voice app needs to cater for differences in linguistics.

One person may say:

“I’d like to order a taxi.”

Whilst their friend might ask

“Please can you book me a ride.”

This isn’t a thing in mobile, as launching an app is always done by tapping on it. Catering for how everyone speaks, or utterance expansion, is an extra complexity in voice design, so use the available tooling and establish a logical process for adding utterances — as a single voice app can have 40,000 of them!

4. Keep Alexa’s Responses Short

One of the problems when designing the interactions between your user and Alexa is that it a skill is developed in writing — but experienced with voice.

Alexa speaks a lot slower than a human, and what looks like a short response on paper is far longer when read by Alexa. This can leave the user feeling impatient, so try to keep Alexa’s responses short and concise where possible.

5. Don’t Have Too Many Steps in the Conversation

Because Alexa speaks slower than us, and because she isn’t as advanced as the conversational digital assistants depicted in films, you should try to not have too many steps in the conversation.

One way to achieve this is to only have confirmation steps for important actions, such as transferring money or buying something. This will help to make your voice experience smooth and engaging.

6. Try Not to Answer a Question With a Question

Even in human to human conversations, having your question answered with a question can feel frustrating.

So, in the cases where you want a back and forth conversation, try and still have Alexa give a useful piece of information before she asks a counter question.

For example, if you’re designing a train app, the user might be able to ask:

“How much does it cost to get from London to Leeds?”

Instead of making Alexa respond with the question:

“When did you want to travel?”

You could make some assumptions and have her say:

“A standard ticket from London Kings Cross to Leeds leaving during peak hours tomorrow costs £110, an off-peak train costs £88. Just let me know the date you want to travel.”

7. Spend Time on the Edge Cases and Half Happy Paths

It’s easy for users to learn what they can and can’t do on a mobile app by quickly flicking through all the different screens the first time they use it, but the same can’t be said for voice.

Through Opearlo Analytics we’ve noticed that users will often “stress test” a Skill when they use it for the first time — for example, by cycling through all the options, or asking questions they are half expecting to not be supported.

Spending time on the edge cases is important so that you can safely guide experimental or new users back into the core functionality of the Alexa skill, and avoid them getting trapped in an error loop and quitting the app through frustration.

8. Minimize Choices

We’re not used to having to remember options as they are read out loud, which is why the Alexa certification team recommend having a maximum of 3 choices at once presented to the user.

A great way to make the choice easier for your user is to number the options, so the user just needs to respond with the number of the option they want, rather than the option itself.

9. Minimize Pressure

The maximum amount of time that Alexa will wait before a response is 8 seconds. This isn’t very long, and users can easily feel under pressure to provide a response. To avoid this, always give the user an option that buys them some more time.

For example, in one of our Recipe Skills, our original design had Alexa respond with the recipe title, then say:

“Would you like to hear the ingredients, another recipe, or start cooking?”

But in our user testing workshops we found that users weren’t sure which option they wanted and often didn’t say any of them, leaving the session to finish unsuccessfully.

We experimented with the design, and after further user testing chose to amend the response so instead Alexa responded with:

“Would you like to start cooking, hear another recipe, or hear the details?”

The “hear the details” gives the user more time to decide what they want to do next, and therefore made for a better experience.

10. AUDIO, AUDIO, AUDIO

By far the best Skills on Alexa have custom audio content in them as part of Alexa’s response. If your Skill has natural characters such as The Grand Tour or Jamie Oliver — get them to custom record audio for some of the responses.

Even if you don’t have a character that could feature in your Skill, there are plenty of other ways to get in audio. For example, an audio logo like The Guardian, or background music like Inspire Me.

Whichever way you do it, incorporating audio into your Skill will make it stand out!

Questions about voice design or analytics? Send an email to team@opearlo.com!

Oscar Merry is the Head of Technology at Opearlo — The Voice Design Agency that helps organizations make their products, services, and content accessible through voice using the Amazon Alexa technology.

Oscar has extensive experience with Amazon Alexa. He has worked with the technology since November 2015, designing and building skills for clients across a number of industries and use cases. He has a diverse background including both Technical Product Management and Engineering.

Oscar runs the London Alexa Devs meetup, which he set up in July 2016 in eager anticipation of the Alexa technology coming to the UK. The meetup has held several events and has grown to a community of more than 200 Alexa developers.

Oscar also developed an Advanced Alexa Skills Kit course for A Cloud Guru.

If you enjoyed this article please recommend and share!

--

--