A Daily Skill: A Study on Voice Experience

Lauren Madsen
A Digital Portfolio of Lauren Madsen
11 min readNov 8, 2018

There are some apps that I tolerate and use, but very rarely are there ones that are created so well, that I am eager to use them. When the design is clear, smooth, and unmistakeable, that’s the sign of a good user experience. For me, that app is Daily Budget.

Daily Budget is a simple app aimed toward helping you budget your money in a way that is simple and accessible. It breaks down all the numbers in to daily amounts and helps keep track from day to day. Once set up, it only takes a couple steps to record daily spendings on a regular basis. I found myself enjoying the app because I didn’t have to spend time on it. This got me thinking into conversation design and if this would be a good candidate for an Alexa Skill adaptation.

The Brainstorm

Though my eagerness to jump into this project was strong I was reminded of the saying, “Just because you can doesn’t mean you should.” And there is no better example of that than conversation design. There are plenty things that could be created into a voice skill but the catch is if their reasons are valid for such a platform like voice.

For this I applied Nir Eyal’s Hook Model from his book, Hooked: How to Build Habit-Forming Products to validate the VUI adaptation of the Daily Budget App:

  1. Trigger — What intrigues the user to your product or service? What brings the user to the app is the desire to save money. What would bring them to the voice skill would be the simplicity and ease of use.
  2. Action — What is the simplest behavior in anticipation of reward? The simplest behavior is recording your spendings (or not spending at all) to see your budget go up day after day.
  3. Variable Reward — Is the reward fulfilling yet leaving the user wanting more? I think it’s safe to say that most people like seeing more money in their budget. With the genius of Daily Budget, they get daily amounts to see their budget increase. Even though they are not actually gaining money in their bank account, they are just gaining potential money available.
  4. Investment — What is the ‘bit of work done’ to increase the likelihood of returning? With any budgeting strategy, it takes persistent effort over a long period of time. With the addition of the skill, this would lessen the effort needed to maintain updated on their budget. It could be as simple as “Hey Alexa, tell Daily Budget that I spent $15 for lunch today.”

The Journey

With the decision validated, I ran to the drawing board. It was time to test some potential scripts to understand the necessary design for the skill. My first interaction with the app drew me into its simplicity and ease that those same elements had to come through to the VUI adaptation. Click here to view the initial scripts and variations of prompts through my study and testing.

These scripts also gave me a better understanding to the app as a whole. Laying out a flowchart of some of the most common paths helped me to understand how the different paths might fit together.

These flowcharts just capture a small part of the potential of the skill. But this exercise brought to light how the skill might function and flow with the different paths a user could take.

With these scripts and flowchart, the first test I experimented with was the initial conversation that a user would have with the app. It would only happen once; however, that would set the stage for the rest of their interactions with the skill. Visit this link to to view the prototype that I used during these tests. The insights I gained through the testing helped me to understand more fully the nuances that conversation design has. Simple adjustment to the prompts can greatly change how a user interacts and feels about the experience.

To summarize some of my findings, I will explain 3 different challenges that came across my way while testing my script.

End-Focus Principle

At one point in the first conversation the skill asks for recurring costs like rent or housing to calculate the budget. The user is given the option to add more to the list if they wish or they can add some more later on. The first iteration of the prompt caused a lot of confusion when testing.

1st iteration: Great. Would you like to add more recurring costs like subscriptions or insurance? Or skip it for now? You can always go back and edit this information later.

What caused the most confusion was that the question was placed in the middle of the prompt. Many users upon testing, tried to answer the question immediately not realizing that there was more to be said.

2nd iteration: Great. You can add more recurring costs like subscriptions or insurance. Would you like to do this now?

This was somewhat better and definitely solved the problem of having the question in the middle of the prompt. But there was still a disconnect with the user not knowing if they really should add more or of this was their only chance to do so.

3rd iteration: Great. I can add more recurring costs like subscriptions or insurance for you. Or you can always edit this later. Would you like to do this now?

This one brought back the information that they could have the option of editing these things later. Another small change was the use of “I” instead of “You” for the suggestion. The “I” felt more personable to the user as if it was serving or helping instead of distant “you”. It became more of an offering for the user.

Another example of this challenge was during the last step of the conversation where the skill asks the user for how much the want to save. In a similar way there was confusion for the user for when they could respond:

1st iteration: Ok onto step 3, Savings. How much would like to save a month? I can do it by percentage or by amount.

2nd iteration: Ok. Onto step 3. For your savings, you can tell me how much you want to save based on a percentage or by a specific amount. How much would you like to save?

3rd iteration: Onto Step 3. Let’s figure out how much you want to save based on a percentage or by a specific amount. How much would you like to save per month?

Just like the first example, the question was best suited for the end, and adding in a more personal feel helped the users feel more at ease with the experience a whole.

In the end this challenge came down to the End-Focus principle: The most important information in a clause or sentence is placed at the end. This applied to this challenge in the way that the question for a user is often the most important part of a prompt. It is what they feel as the “turn” or handing over the mic. In my first iteration it didn’t feel natural for the virtual assistant to ask a question when they weren’t going to hand the mic over to the user yet. They asked a question and kept on talking.

This also goes against Grice’s Maxim of Manner: when one tries to be as clear, as brief, and as orderly as one can in what one says, and where one avoids obscurity and ambiguity. It was unclear to the user that they couldn’t answer the question not knowing there was more that the assistant was going to tell them.

The Maxim of Quantity

With other mediums of technology, most products have an interface with which to interact with. The user is able to see, read, or look at information. But when it comes to voice, the user’s memory becomes the interface. They suddenly have a much more limited canvas with which to paint the picture of information on. This brings me to my second challenge with informational statements. It is easy to add too much information or not enough. There is a balancing act to find the right design of a statement for the user to paint an accurate picture of the information in their head. You have to keep in mind their cognitive load as well as the end-focus principle to structure the statement in the best way.

My first example is with a feature for the skill to record a spending for a future date. It is a way for the user to adjust their current budget with the intent to make big purchase in the future. This feature is called Big Spendings.

1st iteration: Great. Your Big Spendings will take $2.50 out of your budget each day. I can set a reminder for you for the deadline of this Big Spending.

This first attempt seemed half-hearted and didn’t portray the right information well enough. According to Google’s design guidelines for information statements: “Spoken prompts should lead with an implicit confirmation of the information that was said or implied, followed by the new information.” This brought me to my second attempt.

2nd iteration: Great. With 60 days left for your Christmas Presents, that will require $2.50 per day out of your daily budget. I will check back with you on your deadline to see how much you actually spent. Would you like me to set a reminder for you?

This iteration included the user’s title of “Christmas presents” and it also gave relevant information on how this transaction will be handled in the future. It still could be better though.

3rd iteration: Great. For your Christmas Presents, you have 60 days for a budget of $150. This will take out $2.50 out of your daily budget per day. Check back with me on your deadline, December 1st, to tell me how much you actually spent so I can adjust your budget accordingly. Can I set a reminder for you?

This final iteration brought all of the implicit confirmations fo the previous information and brought it into context to what the skill could offer. The feedback of this version was the best. It was the most clear and understandable without giving more information than was needed.

This challenge came down to Grice’s Maxim of Quantity: where one tries to be as informative as one possibly can, and gives as much information as is needed, and no more. With informational statements there is a balancing act with giving enough information so the user has all the necessary context while not overwhelming their cognitive load.

Consistency

The last challenge was adapting the skill for the possible screen view like Amazon Echo Show. This brought in new challenges with a new environment to design for. Since the Daily Budget app already has a solid design in place, I used that for inspiration and the foundation for the adaptation of the screen. According to Google’s guidelines on scaling your design they say, “For conversations in the car or on a smart display, the screen may not always be available to the user. Therefore, the spoken prompts have to carry most of the conversation and convey the core message. The screen can be used for supplementary visual information as well as suggestions to continue or pivot the conversation.” These designs insights gave me direct on how to make some applicable screens for the voice skill. They needed to supplement the conversation as well as provide pivot points in the conversation. The following images are just some preliminary tests on how the screens of the Echo Show might look like in relation to the images on the app counterpart.

Opening/Loading screen for the Echo Show vs the app icon for Daily Budget
Main screen for the Echo Show vs the main screens for the app

These images show the typical home screen for a user. Upon opening up the app, they first see their budget with an additional two days projected budget. The color is a quick indicator of if they are currently over or under budget. Mimicking that same style in the Echo Show screens was a challenge to adjust the placement of information to match the style.

The app includes two buttons on the bottom of the screen to either add or subtract money. The most commonly used button is the expense button indicating that you spent money on something like groceries. For the app this icon is on the bottom right which suggests to me that that it is placed in a position that is easiest for the user to reach (assuming they are right-handed). But for an Echo Show screen, that is not the same case. They will never be holding the device in their hand. As a result, I adjusted these functions to follow the model of presenting them with the most common or predicted choice first. I placed the expense function on the left followed by the income function.

Another challenge was making the information readable. The app has quite a large contrast in the text of the budget compared to the projected budgets underneath the graph. I closed the gap in that styling to hopefully make the information more readable to someone a few feet away and looking at the screen.

Experimenting further I made some designs of how the screen might change with each turn of the conversation. I asked myself the questions: What would the user expect to see when hearing certain prompts? What visuals could help enhance what was said?

These brought a new challenge of introducing new images to what doesn’t exist on the app. The app is simple in its request, but the without the visuals, the same steps take a little bit more effort through voice. Creating the screens for each turn required using matching iconography and style as much as I could. The most important thing was to create something that felt the same as the app. These mockups will have to be tested to be validated but the exercise in consistency brought new understanding to that design principle.

Consistency can often be seen as the most common design principle. It is the most simple and basic things for visual design. But voice brings more depth. Not only was I required to think about the consistent visual design but a consistent experience with the app as well. Overall consistency is the most important when it comes to trust. With the every-changing world of technology it is not surprising that that there are still people out there who are hesitant to use a virtual assistant (or even a computer) to do their work for them. So bringing in consistency with both visual design and overall experience is the first thing in building their trust.

Conclusion

Designing this skill was a great process. Understanding the validity of the potential product is crucial to the design. If there is no need or desire for the product then it might be hard to bring that product to life.

Conversation design includes so many design principles that require precision and a detail-oriented perspective. By taking away the visuals of design, working with the user’s mind and memory presents a challenge for your design strategy. But principles like the end-focus principle, Grice’s Maxims and consistency can greatly improve the flow and ease of your skill.

Scaling your design can also help set you apart and bring your design full circle to provide a well-rounded experience. While there are many types of screens and use cases for the screen designs, it is even more important to understand how your skill might be used.

Though still a work-in-progress, this exercise improved my skills at UX design and gave me a better appreciation for the beauty of conversation design. I look forward to seeing the potential and capabilities of VX and conversation design.

--

--

Lauren Madsen
A Digital Portfolio of Lauren Madsen

UX Designer for voice interfaces. Let’s solve design problems not by falling in love with a solution but falling in love with a problem.