Day 61 — Future of Design series 2/7: “Voice UI”

Roger Tsai & Design
Daily Agile UX
Published in
7 min readApr 30, 2019
Original Photo by Vlad Tchompalov on Unsplash

How popular is voice UI (user interface) in our daily lives? According to Amazon, they sold tens of millions of Echo devices in 2018, which means you can call for Alexa’s help in lots of households. For today’s article, I’m going to share my knowledge around why Voice UI is the future of design, why do we need to embrace it, and what are some best practices. Here’s the breakdown of this article:

  • Why voice user interface (VUI) is the future
  • How do we adopt VUI design in our process
  • Known best practices
“Amazon Echo is officially mainstream”. Article and image by Rakuten Intelligence

Why VUI is the future

There are several reasons VUI is the future:

The Rise of Voice Based Agentive Technology

Thanks to the maturity of Natural Language Processing (NLP), machine learning (ML), and audio technology, we’re seeing more and more adoption of voice based agentive technology. These agentive technology transform the way we used to do things; for example, play a song, text someone, check weather or shipping status, etc. These types of hands free, multi-tasking experience wasn’t really much there until the voice based agentive technology.

User value

Whether it’s Apple iPhone’s Siri telling you your schedule today, or it’s Amazon Echo’s Alexa helps you buy more laundry detergent without your opening Amazon page/app on your desktop/ laptop/ phone, more and more tasks can be delegated to these agentive technology. It made people’s lives easier in certain aspects. The convenience of the VUI based experience is incomparable to the traditional GUI (graphical user interface) like website or app.

From human-manual process to agentive tech. Image source: Future Today

Business ROI

Not only there’s the convenience part of it as user value, but also there’s also huge amount of business value. For example, it broaden the sales channel with a brand new platform (e.g. Amazon Echo, Google Home, etc.), now users can buy things quickly at home. Also it boosts process efficiency by eliminating some steps/barrier (e.g. shopping cart, confirm shipping details, etc.) in the user journey;

Moreover, it helps companies understand each individual users’ personality more, through the dialog in the inquiry/purchasing experience; this is arguably the biggest value of VUI, especially in the age of collecting user data and personalization. Other aspects in business values are like limiting the cost on building/ maintaining GUI, creating entry barrier to non-tech firms, and increasing brand value of being innovative, etc.

How to adopt VUI design

Planning

Before jumping into the VUI solutioning phase, there are many important question we’d like to answer, so that we know we’re going to be in the right place. For example, when is a good time to consider using VUI, considering the investment vs. project readiness? What are the business cases are most suitable based on the existing VUI capabilities? Is it adding value to the users, or create more annoying experience than enhancing existing solutions?

Competency

In order to create a well-designed voice UI, we first have to understand the required competencies:

  • Linguist
  • User Experience Researcher/Designer
  • Natural Language Processing Engineer
  • Machine Learning Engineer
  • Sound Engineer
Image source: Agentive Tech Twitter

Branding

The nature of voice communication brings in an unique element of personality. Not only the recorder voice, but the wording, usage and other factors can convey certain personality to users. It’s important for the creators of VUI to determine both what kind of the personality they want to convey, and how strong the personality elements need to be involved. This will shape the perception of users toward the brand, therefore it’s not an easy task and need to be thoughtful about the potential impact.

Challenges

There are several inherited challenges due to the nature of voice and dialog types of communications, and the maturity of artificial intelligence. Just to name a few:

  • VUI is not good at handling complex instructions or detecting complicated human speech habit. For example, Alexa may not understand when we say “Get me two dozen— no three, maybe five actually, some socks — oh never mind.”
  • Given the “time” aspect in speech, compared to GUI, VUI has limited capacity of presenting lots of information at the same time. For example, it’s not hard to show 30 options on a website (think about product catalog on Disney website), but it’s quite challenging to broadcast all those options and let users effectively memorize all 30 options and pick one (think about those times when a restaurant waiter went on-and-on on the long “today’s special” list)
  • Without enough context, it’s hard for VUI system to detect users’ true intention. For example, the sound four-tee-cup could means a list of different things: forty cup, four tea cup, four tee cup. etc.
  • Another challenge is to answer appropriately, when the VUI system doesn’t quite understand the questions, or simply don’t have the answer. For example, when users start asking about the prediction of the next election result.

Best Practices

Unlike GUI, voice UI is fairly new, and we’re still exploring better ways to effective craft a satisfying experience, with the hope that one day a voice virtual agent could be as smart, or even smarter than a real human agent. With that said, there are some accumulated knowledge about how to best design a voice user experience:

Service Types

Voice UI is specifically strong in taking simple direction and provide corresponding information. For example, weather report, checking calendar, telling stock price, etc. Another category that VUI is good at is to execute simple order. For example, buying paper towels, playing music, telling a joke (Alexa has told over 100 million jokes). In general the VUI systems is better to provide high value with low user effort.

Image source: Agentive Tech Twitter

Content

As mentioned before, one of the biggest challenges is to design the appropriate dialog. When responding to users, given the limitation of human’s short term memory, it’s more effective to only provide one piece of info at a time, so that they don’t forget or get confused.

In terms of executing command from users, it’s fairly hard to predict what users want at the moment when they speak into the device. With that said, so far the design principle is that, don’t assume users know exactly what to do. They probably have a rough idea what’s the “Job-to-be- done”, but might not be exactly sure about each every steps in the process. Therefore, instead of relying on users to elaborate clear demand, provide options in each steps for them to choose from.

In regards to designing the answers, try not to overwhelm the user by providing too many options. The general best practice is asking/answering with less than 3 options. When providing users help with information, usually giving examples are better than giving instructions, so that they can know better and quicker what the system requires.

Tone

When it comes to designing the personality of the VUI service, it really depends on the branding strategy of the service/ product/ brand. There are bots and VUI that has a vivid personality that you can easily detect through the dialog. For example, my friend who designed Poncho, “a Brooklyn cat that tell you about the weather”, intentionally created a lively character for the chatbot. Because of that, Poncho, the “cat”, actually has a group of fans around the globe. However, we also heard that some people are not so into those witty chat in the dialog. Therefore, personality in VUI could be a double-edge sword.

Poncho, the Brooklyn cat (chatbot) brings you weather with charisma. Image source: Cantina

In general, if it’s a utility driven service and task is expected to be done in a fast pace, it’d be better to keep it brief and concise, stay neutral in personality, and not chatty. I once heard from an Amazon engineer talked about another guideline for designing VUI. He said, design it “for the ear, not for the eye.” Ever since, I started to pay attention to how radio hosts speak, and I feel like there’s a lot we can learn from that.

Learning

With proper support from AI, NPL, and machine learning, a VUI service should evolve over time. However, the more effective way is not only replying on the technologies, but also have real human to understand what challenges user are facing, and get help from linguists and user experience research/designer.

Conclusion

  1. Voice UI is not only a strong trend, but also a valuable service for both users and businesses;
  2. Given this field is still new, we’re seeing lots of challenges for it to grow to its full potential;
  3. As we are accumulating general best practices, we should not only utilize technology to grow the service capacity, but also get real human’s help to strengthen the capabilities and performance.

Do you have experience working on VUI or chatbot? I’m eager to learn from you.

ABC. Always be clappin’.

To see more

All Daily Agile UX tip

The opinions expressed in this article are those of the author. They do not represent current or previous client or employer views.

--

--