The Alexa Recalls Innovation Pilot: Designing for Voice in Government

12 min readJun 19, 2019

The Challenge of a Voice MUP

With the rising popularity of voice assistants, the Government of Canada is exploring the potential of voice assistants as an innovative channel to access government services. My team at Transport Canada was tasked with building a Skill that would permit citizens to look up vehicle recall information with the help of an Alexa Skill. The goal is to (1) release an MUP “minimal usable product”, and (2) document insights and research findings to inform future projects on voice assistants in the government.

The team consists of five people including a developer, two user experience designers, a project lead and SCRUM master, and a product owner from the vehicle recall department. As a designer on this project — here are some of my observations and tips for anyone working on voice interface projects.

Talking to Humans

Voice Assistants: a Challenging Opportunity

Designing for conversations is a specialized skill — no pun intended — that designers need to develop. A research study, even a small one, is necessary to gain an understanding of the medium that we’re designing for. Our team interviewed people to understand how they use voice assistants, what they like about them, what pain points they experience with them, and what they wish could be improved with the technology. This made it easier for us to understand user expectations, what pain points we need to watch out for when designing our script, and what we could reasonably expect in terms of “viability” of our Skill once it goes live.

Here are some things for voice interface designers to consider…

People we interviewed seemed to use voice assistants for what they describe as “simple tasks” or “basic” functions — like checking the weather, setting alarms, etc. Only a few of the users that our team spoke to currently use voice assistants for tasks that require several steps to achieve a goal. Sometimes users share their Echo or Google Home device with others — for example, family members. While some told us that they feel awkward using a voice assistant in public, many like using voice assistants when their hands are busy — while cooking or driving.

Findability issues make the experience of accessing Skills challenging as you need to download a Skill to use it — and it’s rare to stumble upon a Skill through a simple Google search. Also, in Canada, we still have to use invocation names to access Skills, which requires the user to rely on memory rather than recall to access the application. Nielsen Norman Group published an article called “The paradox of Intelligent Assistants: Poor Usability, High Adoption” which points out that while usability metrics for voice assistants tend to be poor, voice assistants still seem to gain popularity. One possible explanation is that people often restrict their use of voice assistants to familiar tasks.

Some users have to repeat themselves a few times to get their voice assistants to understand their commands. When testing our Skill, we found that while voice assistants are usually well trained on understanding common nouns, the situation might be more difficult for some proper nouns — especially in a language that differs from the “origin” of the proper noun. For example, in our project, Alexa had some difficulties understanding vehicle names like “Hyundai” that were said in French. French pronunciations of those names vary greatly based on the person, their region, etc., and Alexa was sometimes unable to recognize them based on users’ utterances.

Do People Find Recalls or do Recalls Find Them?

Most people we talked to receive their vehicle recall information in a letter from their manufacturer. While most users don’t typically look for vehicle recall information in an active manner, some of them might look up vehicle recall information based on what they hear in the news or social media. Many expressed interest in a feature that would give them notifications on new recalls for their vehicle. An addition of this feature to the next iteration of our Skill could add more value to the application than simply looking up vehicle information.

Voice Assistants for Accessibility: Inclusive or Exclusive?

Voice assistants present opportunities to help people with some accessibility challenges— mobility, visual impairment, difficulty reading. However, voice assistants can be challenging for people with speech difficulties, memory issues, or with little access to technology. Because of this, our team hypothesizes that our Alexa Skill could help accessibility as an additional channel for a service — but should not replace an existing one. In the next research phase, our team will do some primary research on accessibility for our Skill.

Talking Like Humans

VUI stands for “voice user interface”. When we’re taught to write in school, we’re told to write formally: avoid contractions, write full sentences, avoid “filler” words, etc. When writing a script for voice interactions, we need to un-learn some of that.

One of the objectives of VUI scripts is to make the conversation as natural as possible and for robots like voice assistants to sound “human-like”. It’s clear that people don’t speak the way they write. But what exactly makes a script seem like something a human would say?

When we speak, we tend to use contractions, which is the shortening of certain words or groups of words. (ex: I have found VS I’ve found)
Spoken language tends to be a lot more economical than written language. When writing scripts, we try to avoid wordiness and use plain language. (ex: Recall potentially affecting your vehicle VS recall may affect your vehicle)
Adding discourse markers to the script also makes it seem more natural. Discourse markers are words like “alright”, and “hmm” that seem not to add much to the sentence and are generally avoided in written text, but they actually play an important role in spoken conversations. They show that we acknowledge what the other person is saying to us. In VUI design, it provides instant feedback to the user that says — I’m listening. ( ex: Hmm… I’ve found a few different models of 2018 Volkswagen Golfs.)
A fragmented sentence is an incomplete sentence — it’s missing a grammatical element. When writing formally, such sentences are avoided. In conversational design, using them can be a way to make the script more natural. Humans are great at deriving meaning from context and mentally filling in the blanks, which is why full sentences are not always necessary — and possibly redundant. (ex: And the model of your vehicle?)
Humans also have the ability to adapt their responses based on what they hear. The VUI script also needs to adapt and prepare appropriate follow-ups to users’ utterances. For example, if Alexa asks for the make of a vehicle first, and the user gives both the make and the model, the Skill should process the information already provided and ask for the next logical question. Again, we try to be economical with words and avoid repetition of information.

Alexa: What is the make of your vehicle?
User: Honda Civic
Alexa: OK. And what is the year of your vehicle?

6. To sound “human” and not give away the bot identity too quickly, planning for errors and unexpected utterances is important. Progressive reprompts help the user retrieve information that the Skill is looking for. They reorient the conversation by restating the same question with more information or a different way of stating the same information. If the user had trouble understanding the question, they are given other chances to re-state their answer.

Alexa: What is the make of your vehicle?
User: uhh…
Alexa: What brand makes your vehicle?

7. But we don’t want to fool users into thinking that the Skill is ACTUALLY human. When designing scripts for chatbots or voice assistants, we need to clearly communicate to the user what the bot is capable of doing. People can expect a lot from “AI” and can get frustrated if their expectations are not managed. When a user makes an unexpected request, outside of the knowledge domain of the Skill, it is important to provide the next steps that the user can follow to complete the task. For example, if there is no information on a vehicle that the user is searching, we provide them with the contact information for customer support. This can also be a good opportunity to introduce some humor and personality to the Skill by communicating that the Skill is “still in training”.

Alexa: I’m sorry, I don’t have any information on your 2018 Volkswagen Golf R at the moment. I’m still a voice assistant in training. Please contact my human friends at Transport Canada Recalls customer support at 1–800–333–0510 for more help.

8. When designing for a graphical user interface, we use elements like typefaces, font weights, and colors to express information hierarchy, meaning and brand identity. When designing voice interfaces, speech details like pauses, emphasis, and phonetics can make a significant difference in the perceived meaning of a sentence. The markup language used to modify those components is called SSML — Speech Synthesis Markup Language.

For example, emphasizing certain words can give a sentence a completely different meaning. When you modify the emphasis level of a word, you change the rate and volume of speech.

Alexa: What is the make of your vehicle?
User: Honda Civic
Alexa: Great! What brand makes your vehicle?

If we put too much emphasis on a certain word, it might be seen as an inappropriate tone for certain contexts. For example, one testing participant commented that Alexa was saying “great!” very enthusiastically, and later went on to inform him that she had no information on his vehicle. The over-emphasis on the discourse marker made the interaction seem misleading. Also, we don’t want the user to think that Alexa is making a value judgment about their car…

9. When data is not voice-ready, long, wordy chunks of text lead to poor listenability. While we were able to control most of the content of the script, the description of the vehicle recalls was pulled directly from the Transport Canada recalls database.

This means that when the user receives vehicle recall information, they hear the text from the database word per word. Most content in the database is meant for reading, not for listening. If there is more than one recall for the user’s vehicle, the text becomes even longer. Because our team could not change the text in the database, we had to come up with workarounds to improve the user experience as much as we could:

Solution 1: We made sure to add texting or email options to view the information, so the user is not expected to remember information from a long text.

Solution 2: We added discoverable commands like “skip” to give users more control when listening to the long vehicle recall text.

The Measure of an MUP

Terms like Minimal Usable Product are often used during projects, but what do they mean?

In user experience, “usable” or “usability” traditionally measure the degree of effectiveness, efficiency, and satisfaction of a system. However, we decided that for us perceived efficiency was more important than quantitative efficiency metrics — such as time spent on a task. For the purpose of our project, the team decided to focus on the following metrics:

Effectiveness: the user is able to successfully retrieve vehicle recall information and/or receives clear instructions for a logical next step
Satisfaction: the user is satisfied with their experience

In other words, we have an “MUP” if the following conditions are met when we test the product: (1) the user either receives their vehicle recall information or, if the Skill doesn’t have information on a particular vehicle, the user receives clear instructions on where to get the information, and (2) the user expresses satisfaction with our product. We measure those conditions qualitatively by observing the interactions of the user — their expressions, body language, etc. — and by asking users to describe their experience with our product in post-test interviews.

Testing the RITE Way

To build a conversational script, we started testing with users early with a low-fidelity Wizard of Oz prototype. First, we would test with a “happy path” — an ideal scenario where the user receives their vehicle recall information with no issues. Then, we started prompting users with scenarios where they would make “mistakes” in order to test our “unhappy paths”. This gives us information on content — which words are more intuitive for users? What are they trying to do with our Skill? Should we include a new path in our script based on search attempts?

In the following phase, we started testing with a functional Skill. A functional Skill gives us even more information — We tend to work in RITE (Rapid Interactive Testing and Evaluation) cycles — instead of trying to build the whole script and then testing, we build part of the script, the developer implements it, we test it with a small number of users, and then we iterate based on findings.

Lessons Learned

If you’re getting ready to design a Skill, a voice user interface, or any other voice design pilot project…

Be prepared to learn to design for voice. Naturally conversational and Voice User Interface design has a different set of challenges from graphical user interface design. It’s important for designers and researchers on the team to get training and consult resources on conversational UX, voice user interface design and gain general knowledge on linguistics and sociolinguistics.
Get aligned through user research. Other than informing the direction of a problem statement or a choice of technology to prototype a solution, user research does something even more important — it helps the team build a common mental model and result in better coordination. If there is no common understanding of the user or the purpose of the product, every member of the team is going to be designing and building for a different user. Even in a later stage of the project, if the team has uncertainties about the user, it’s not a bad idea to go back and do some research to align the team and improve the design moving forward. Try to include all team members in user research — including PMs and developers. This generates buy-in for UX and gives all team members a sense of ownership of the experience.
Define clear project objectives. Understanding what questions we are trying to answer by building our Skill can have a profound impact on the smallest of design decisions. Allowing time for a project team to have a sufficient discovery phase leads to a better understanding of the choice of technology and intended use cases. While voice assistants and voice interfaces are growing in popularity, Skills with their current limitations — such as reliance on invocations — are not always the best solution to improve services or the best channel to deliver voice interactions. For example, UK Government Digital Services recommends making government website content more understandable to search engines to make it easier for voice assistants to provide information to users without necessarily relying on Skills. That being said, capabilities of Skills are constantly evolving and it will be interesting to see how they could be leveraged in future projects. It will also be interesting to see how Skills could be leveraged for accessibility purposes — which is the next step for our team in this project.

Sources

A Crash Course in Voice UX. Retrieved from Udemy — Crash course in Voice UX

GOV.UK, S. D., Hurrell, M., Digital, & Digital. Hey GOV.UK, what are you doing about voice? Retrieved from Hey GOV.UK, what are you doing about voice?

Lambdin, C., & Lambdin, C. (2019, February 11). Some Thoughts on Design Research, Agile, and Traps. Retrieved from Some Thoughts on Design Research, Agile, and Traps

Lindberg, O. (2018, November 19). How to Use Voice UI Prototyping for Delightful User Experiences. Retrieved from Ask a UXpert: How to Prototype Voice Experiences that Delight Users

Prototyping for Voice: Methodology — BBC R&D. Retrieved from Prototyping for Voice: Methodology

Speech Synthesis Markup Language (SSML) Reference. Retrieved from Developer Amazon — SSML Reference

Sullivan, B., & Sullivan, B. (2016, August 29). Five Ways to Perform Rapid Usability Testing Faster. Retrieved from RITE process

The Paradox of Intelligent Assistants: Poor Usability, High Adoption. Retrieved from NN Group Study

Voice assistants: Accessible for some, not everyone. (2018, February 01). Retrieved from Article — Voice Assistants and Accessibility

Want to learn more about our project?

Twitter: @LisaUXStories, @AinsleyBernard2, @zachworkaccount

GitHub: Alexa Transport Canada