10 Best Practices when Designing for Voice

Jess Williams
Published in
6 min readMay 26, 2017


Most people that have used Alexa will recognise that experiences through voice are very different than on screens, which is why Voice User Experience design — or VUX design — requires a different set of skills.

1. Manage Users’ Expectations

Movies like “Her” and “Ex-Machina” mean that people’s expectations of what digital assistants can do varies greatly. Although the technology has come on massively over the past ten years, we’re still in the equivalent of the Nokia 3310 days of the mobile phone revolution, so making sure you align your users’ expectations to what your voice app can deliver is so important to avoid those one star reviews.

You can do this in the skill description, in the Alexa Cards, and by creating a landing page for your skill showing a video of how to use it, like Capital One have for their Skill.

Choosing a list of features for your app with a common theme can also play an important part in the design process. For example, it doesn’t make sense for a travel voice app to have 90% of its features centered around holiday inspiration, and then 10% around FAQs, as chances are that as soon as a user has 1 FAQ answered, they’ll start to ask for loads more and be left severely disappointed when the others aren’t supported. Keep it simple to start with.

2. There doesn’t need to be a hierarchy!

Screen based applications have a hierarchical GUI, which users can safely tap through, always starting from the home screen or the menu button.

But part of the delight of voice apps is that you don’t have to do this — they can be designed so that a user can reach any part of the experience on first launch. This is what differentiates a great voice app from one that sounds like an IVR system, which is why we spend a lot of time making sure each feature is designed and built in isolation.

3. Consider the Linguistics

The most delightful moments in voice occur when you ask Alexa something in a niche and personalized way, and she still gets it. Which is why a great voice app needs to cater for differences in linguistics. One person may say “I’d like to order a taxi,” whilst their friend might ask “please can you book me a ride.” This isn’t a thing in mobile, as launching an app is always done by tapping on it. Catering for how everyone speaks is an extra complexity in voice design, which is referred to as utterance expansion, so use the tooling available and establish a logical process for adding utterances, as a single voice app can have 40,000 of them!

4. Keep Alexa’s responses short

One of the problems when designing the voice interaction model between a user and Alexa is that it is done in writing but experienced with voice, yet there aren’t yet “Alexa mock up” tools out there.

One thing you’ll notice is that Alexa speaks a lot slower than a human, which means that what looks like a short response on paper is far longer when read by Alexa, and will leave the user feeling impatient, so try to keep Alexa’s responses short and concise where possible.

5. Don’t have too many steps in the conversation

This goes hand in hand with point 4. Because Alexa speaks slower than us, and because she isn’t yet like the advanced conversational digital assistants depicted in films, try to not have too many steps in the conversation. For example, only have confirmation steps for important actions, such as transferring money or buying something. This will help to make your voice experience smooth and engaging.

6. Try not to answer a question with a question

Even in human to human conversations, having your question answered with a question can feel frustrating. So, in the cases where you want a back and forth conversation, try and still have Alexa give a useful piece of information before she asks a counter question. For example, if you’re designing a train app, the user might be able to ask: “How much does it cost to get from London to Leeds?”

Instead of making Alexa respond with the question “When did you want to travel,” you could make some assumptions and have her say “A standard ticket from London Kings Cross to Leeds leaving during peak hours tomorrow costs £110, an off-peak train costs £88. Just let me know the date you want to travel”

7. Spend time on the edge cases and half happy paths

It’s easy for users to learn what they can and can’t do on a mobile app by quickly flicking through all the different screens the first time they use it, but the same can’t be said for voice.

Through Opearlo Analytics we’ve noticed that users will often “stress test” a skill when they use it for the first time –for example by cycling through all the options, or asking questions they are half expecting to not be supported.

Therefore, spending time on the edge cases is super important, so that you can safely guide experimental or new users back into the core functionality of the skill, and avoid them getting trapped in an error loop and quitting the app through frustration.

8. Minimize choice

We’re not used to having to remember options as they are read out loud, which is why the Alexa certification team recommend having a maximum of 3 choices at once presented to the user.

Through our user testing workshops, we discovered that numbering the options also really helps here, so the user just needs to recall the number of the option they want, rather than the option itself, which could be a whole sentence. The A Cloud Guru skill demonstrates this well.

9. Minimize Pressure

The maximum amount of time that Alexa will wait before shutting off after speaking is 8 seconds, which isn’t very long and so as a user, it’s easy to feel under pressure whilst engaging with a skill.

To avoid this, always give the user an option that buys them more deciding time. For example, in a recipe skill we were building, our original design had Alexa read out the recipe title, then say “Would you like to hear the ingredients, another recipe, or start cooking.” But in our user testing workshops we found that users weren’t sure which option they wanted and often didn’t say any of them, leaving the session to finish unsuccessfully.

We experimented with the design, and after further user testing chose to amend the response so instead she said, “Would you like to start cooking, hear another recipe, or hear the details.” The “hear the details” option meant the user had more time to decide what they wanted to do next, and therefore made for a better experience.


By far the best skills on Alexa have audio content in them — this is when a piece of audio is played back through Alexa that isn’t her voice.

The Grand Tour skill features the presenters, the Jamie Oliver one has a short message from him at the end of each recipe.

Even if you don’t have a character that could feature in your skill, there are plenty of other ways to get in audio. For example, an audio logo (like the find On My Way Skill), or background music like Inspire Me. Whichever way you do it, incorporating audio into your skill will make it stand out.



Jess Williams

Head of Research @cognitionX. Previously CEO of Opearlo, acquired by matchbox.io