Designing for Voice Interfaces

Published in

Russell Andlauer's Pixel Playground

10 min readApr 29, 2018

My foray into voice experience design brought about a lot of challenges and opportunities to learn and grow as a designer. The lack of a visual interface posed some unique difficulties that forced me to design differently than I normally would. Even though I had to think of things in a different way, the voice experience design process parallels the traditional user experience design methodology in a number of ways. Here is my journey into voice design.

Alexa Design Basics

This year I have been trying to incorporate voice interfaces as much as I can into my daily life. I have used the Google Voice Assistant on my phone and Microsoft’s Cortana on my desktop PC. The main voice platform I have focused on though is the Amazon Alexa. I purchased an Echo Dot at the beginning of the year and I developed a skill prototype for the device. A skill is the Alexa equivalent of an app to a phone.

The two primary components of an Alexa skill are intents and utterances. An intent is what the user wants Alexa to do. This could be something like looking up movie showtimes at a theater. An utterance is what the user actually says to Alexa. For example, “Alexa, when is the Avengers playing?”

Another component of Alexa skills are slots. These are containers that can be a large variety of options. Let’s say there is a skill for ordering pizza. There could be a slot for the size of the pizza because there could be several options. Someone could say things like, “Order a small pizza”, “Order a medium pizza”, or “Order a large pizza.” Slots enable the skill developer to create an utterance, “Order a {size} pizza” instead of creating as many utterances as there are sizes of pizzas. When you are designing an Alexa skill you already have to map a large number of utterances to most intents. Slots greatly reduce the quantity of utterances you need.

The UX of Voice

Even though you typically aren’t designing with a visual interface in mind, voice interface design is still similar. There are times when the voice interaction model complements the visual one, but I haven’t designed for both yet.

The first step when designing a skill is to write out use cases. These are similar to user stories. These could be things such as, “User orders a pizza.”, “User picks pizza toppings.”, and “User enters payment information.”

Next, you want to start practicing a few use cases out loud, either alone or with a coworker. This is a low time investment activity and it gets the ideas more organized. It’s kind of like how you sketch when designing a web page — it’s one of the first steps you do.

After you have rehearsed some use cases, it’s time to start writing scripts. I prefer to do mine in Google docs because it’s easy to share and collaborate with others. As for the scripts themselves, they are just a back and forth between the user and Alexa.

I start by writing a “happy path” for the user to take through the skill. This way I can focus on getting an idea for the flow of the skill before worrying about error handling and flow diagrams. After I’ve created a rough draft of a few scripts, I edit them to follow best practices for voice. I use both Amazon and Google’s guidelines which can be found here.

Amazon: https://developer.amazon.com/designing-for-voice/

Google: https://developers.google.com/actions/design/

Once you have made some changes to your scripts based on best practices, it is a good time to act out the scripts and review them with a colleague. The next step is to prepare to build your skill with Amazon’s Developer Services.

In order to do that you need to create a spreadsheet with the skill info, intents, slots, and utterances listed. Again, it’s best to do this using Google Docs. After doing this, it’s a good time to create a flow chart to cover the multiple paths a user can take through your skill. Next, you could create a medium-fidelity prototype in a web app called Sayspring.

These are some of the deliverables you could hand off to a developer. I was fortunate enough to have some guidance with the development side of things so I could create a high-fidelity prototype by coding with Amazon’s Developer Services. The voice experience design process is an iterative one, just like user experience design. When designing and developing my skill prototype, I cycled through this process several times. I’ll illustrate these steps with the skill I created.

Memory Palaces — The Focus of My Skill

Image Credit: https://en.chessbase.com/post/memory-techniques-memory-palace-from-roman-times-to-today

I decided to make a skill about memory palaces. A memory palace is a memory technique where you visualize a location you are familiar with such as your living room. You assign different ideas to specific items in that room. If I had made a full blown skill, I would have made it so that the user could create their own memory palace and assign any idea to different items in the room.

Including these features would have gone outside the realm of my coding abilities. I decided to focus on a prototype for the skill that included an explanation of memory palaces and a tutorial for the user to get used to how memory palaces work. For the tutorial I had Alexa create a memory palace of a living room with five locations in it.

The user then inputs five different cards from a standard deck of cards such as Ace of Spades, then assigns them to the five locations in the living room. Alexa repeats back the user’s memory palace a few times to practice, then tests the user to see if they can match the cards to their respective locations.

My skill had two main challenges to overcome: brevity and retention. The nature of my skill required Alexa to speak more than ideal. I had to somehow explain something that isn’t common knowledge in as few words as possible. Amazon recommends that Alexa speaks for only as long as it takes for the average person to be able to say something in one breath. My explanation section was longer than one breath so I chose to break up Alexa’s speech with several, one second pauses between thoughts.

The other challenge, which is common to all voice interfaces, is retention. Without a screen to see the information, it is difficult for people to remember things while interacting with a skill. This was why I specifically chose a memory skill. I wanted to focus on retention as a major aspect of my design. As you’ll see, my limitations with coding made it more difficult for people I tested to remember aspects of the skill.

My Process

For my skill, I only had two use cases: an explanation and a tutorial. I practiced both out loud for a bit to get an idea of the direction I wanted to go with the use cases. Then, I started to write out scripts for both use cases. I chose to do my scripts in one master script document, but there are nineteen scripts included; one for each intent. You can see my scripts here:

My Google Doc Scripts

I made several changes to my script throughout its development. One change I made was when Alexa informs the user of the five locations in the memory palace. Originally, Alexa listed the five locations and then there was a short pause afterwards. When I did my own testing, I found that it was hard to keep track of all five locations as Alexa read them off so I changed the script to include a one second pause between each location.

Another change to the script came from user testing. When Alexa explains the tutorial, it originally said, “We are going to try and remember five cards in order.” Based off of a tester’s suggestion, I changed the script to say, “We are going to try and remember five cards of your choice from a standard fifty-two card deck.” The tester wasn’t aware that she had to input the cards to later remember. Also, there needed to be clarification of which kind of cards were being used.

This is the final version of my script. Each script matches to an intent. In reality, I created the intents in a spreadsheet before the script was finalized. As I’ve mentioned before, it’s an iterative process. Here is the voice user interface document that includes the intents, slots and utterances for the skill prototype.

Memory Palace Skill VUI Documentation

Skill Info Basic Skill Information Skill Name:, Memory Palace Invocation Name:, Open Memory Palace Skill Description…

docs.google.com

You may have noticed a few peculiar things when looking at the utterances section. For different intents with a “yes” or “no” response the sheet shows “yes one”, “yes two” etc. I did this to be able to code the skill so that Alexa could distinguish which intent the user wanted to go to when saying yes. I spent a long time trying to make it so that the user could just say “yes” or “no” but I couldn’t get it to work. Since I was just developing a high-fidelity prototype as opposed to a full-fledged skill, I thought that this solution was acceptable.

I had to do a similar thing for the intents that assigned the user’s cards to different items and the intents that tested the user on their recollection. For example, the “Card One Get Intent” required the user to say, “card one, rank and suit.” instead of just rank and suit. This proved to have greatly interfered with user’s ability to smoothly navigate through the skill. If I were a developer I am sure that I would have been able to make it so the user could just say the rank and suit of their card but that is beyond my scope of knowledge when it comes to coding.

After I had mapped out my intents, utterances, and slots, I created a flow chart with Lucid Chart to organize how I wanted the prototype to flow. You can see it here.

Memory Palace Flowchart

www.russellandlauer.com

Empathizing with Developers

Instead of creating a medium-fidelity prototype in Sayspring, I chose to code the prototype with the Amazon Developer Portal. I won’t go into detail about that process but there is a lot to it. The experience has really made me empathize with developers more. Although I did study computer science for a couple of years before I realized it wasn’t for me, getting a reminder about the challenges of programming was a valuable experience.

Coding for Alexa has it’s own unique challenges. It was a bit frustrating but ultimately fun to be able to code something that works on my Echo device. I learned a lot more about Alexa’s limitations and the design considerations I must make by coding a prototype than if I had made a lower-fidelity prototype.

User Testing

I was able to test my skill prototype with three people. One had a pretty good understanding of memory palaces and the other two were vaguely familiar with the concept from the TV show Sherlock. As I’ve mentioned before, the solution I reached for using repeating utterances negatively impacted the user testing.

To combat this I made a cheat sheet for each person I tested. I wrote down “yes” with the count, seven times and had them cross off each “yes” as they said it. I also let them write down their cards because Alexa had a short window of time it would accept the user’s input for the “get” intents. Saying, “Card one, Ace of Spades” took too long for Alexa to register the utterance if the user was still thinking of which card they wanted to say.

There was also an issue when Alexa later quizzed the user on their cards. Alexa would say, “Just say the living room item followed by the card. Which card matches with the couch?” and the user would have to respond, “Couch, Ace of Spades.” This wasn’t natural for people to say. Every one I tested said just the card rank and suit the first time through.

The feedback I got was that having to say the card number or the object before the card was the biggest frustration with the prototype. If I were working with a developer and they were able to avoid that solution, then I’m confident the skill would have been a lot more successful.

I did have one participant say that this was a fun introduction to memory palaces. He hadn’t experienced the Alexa platform before and he enjoyed the novelty of the system. Another person said that the explanations for the steps were good. She thought that they were concise and clear.

Memory Palace Skill Demo

Now that I’ve spent all this time talking about the skill, I’m ready to actually show it to you in action. Below is a screen recording of me going through the skill prototype.

Memory Palace Skill Prototype in Action

Summary

I’ve learned a lot about voice experience design these past few months. I see there is a lot of potential for voice interfaces to assist people in their daily lives. Designing for voice has challenged me to a unique set of constraints. Not having a visual interface forces you to think through the path the user takes even more than you normally would.

Designing for voice has also made me design more in terms of systems than I usually do. With the way Alexa works using intents, utterances, and slots, you have to be conscientious of how they all come together to work within the skill. Whether or not I continue to design for voice interfaces, this experience has made me a better user experience designer.

For more articles related to User Experience Design, make sure to check out the blog page for Toptal here: https://www.toptal.com/designers/ux/posts

Russell Andlauer is a Digital Product Designer at DevSimplified and a User Experience Design Freelancer. He received his Bachelor of Science degree at Utah Valley University in Interaction & Design.

If you are interested in partnering with him for your project please reach out to him at https://designwithrussell.com/