Intro: Voice/Conversational User Experience and Why

Qian Yu
5 min readFeb 5, 2018

--

From Interactive Voice Responsive systems letting you know account balance, to Siri replying to messages on your phone, to the smart speakers in your kitchen setting timers; products with voice user interfaces are pushing our laziness far beyond what we realize. My favorite is telling Alexa to stop the alarm and play my Spotify playlist every morning. ☀️

I consider myself as a Voice/Conversational Experience Designer, rather than a “Voice/Conversational UI” Designer. Personally, the term “interface” usually means something that people can easily perceive. It is intuitive for people to start building knowledge about what the product is, and how to interact with it. But when you talk with a machine, most of the time its capability isn’t intuitive enough to help you form that knowledge base or take any action. You may start trying things out, get surprised, then get encouraged to try more. The fun of unpredictability and the initiatives taken by a user isn’t portrayed well in user interfaces.

The pattern of the process may vary depending on the product, but the point is, beyond the direct interaction, there is an ongoing education and learning process between the user and the product.

Before a user has purchased anything, they have already researched the product’s capability and how to use it, as there is tons of material online. For example, on YouTube, there are about 1,100,000 results for “Amazon Echo”.

The user gains further knowledge and shapes a deeper understanding about the product during all touch points: opening the box, plugging in the device, seeing and feeling the product, and hearing its voice. They will probably discover much more than expected as their knowledge base starts growing rapidly, and slowing down until the product fits into the user’s daily routine, without any tension.

As designers, it is our job to identify potential patterns so that we can modify the interaction logic and utilize the UIs and voice prompts to make the overall experience natural and appealing. As for details about how to design a Voice/Conversational Experience, there are more posts to come. 😉

Before developing a product, you should first understand

why your product should offer a Conversational/Voice Experience.

👍 1. It allows users do two things at once.

First of all, who doesn’t like multi-tasking? If a product could be used when a user is doing something else, then the voice interface is a good potential add-on. Of course designers want their product to be used and valued by the users, but that doesn’t mean people can’t be distracted by things of a higher priority. The ability to stay focused on one thing whilst also conveniently doing something else at the same time is very appealing to lots of people.

👍 2. It makes everything “one command” away.

Are you struggling with finding files on your laptop, or looking for specific information on a website? Tired of scrolling up and down, and scanning back and forth? Similar to how “Spotlight Search” works on Macbook, voice interaction has the potential to bring everything up to the surface. You can do a very specific info search with just one voice command. Also don’t forget, in the physical world, instead of walking to the kitchen to turn off the light, you can tell Alexa to do it in just a few words.

👍 3. It builds personal connection.

It’s animal’s nature to use vocal communication for many purposes, including social interaction, sharing info, and even mating rituals. When we hear a voice, there aren’t just words and tone coming through, but also lots of unconscious activities going on in our minds. Voice interaction is emotional and more persuasive than Graphical User Interfaces (GUIs). Gaining understanding and knowledge about the entity, even without our notice, unconsciously makes us feel more connected to it.

Now it may make sense to make voice a part of your product,

but here are some concerns you don’t want to forget.

👎 1. It doesn’t handle large amounts of information very well.

Voice interaction alone (without any GUIs) is not the best at handling large amounts of information. People can only hear dialogues spoken at a certain speed, and it takes time to understand the meanings. Plus our short term memory is not always reliable. Voice interaction is linear and happens in repeating stages. If the dialogue is too long, people might forget about the previous conversations and lose the details.

👎 2. It is lack of privacy.

Voice interaction can handle a large number of scenarios, but the user’s input in telling the product everything out loud might cause issues. People may be comfortable when setting their alarms out loud at home, but probably not when reading their credit card number digit by digit clearly in public. Also there are some users who are highly concerned about security and privacy, so they may refuse to use voice interface at all to avoid their conversations being sent to another company.

👎 3. It’s challenging in noisy environments.

In order to make a conversation enjoyable, the participants should first understand what the other person is talking about. If the auditory signal of the command conveys many other voices or noises, the machine probably won’t correctly understand it. That would be a huge flaw in the experience and lower the user’s interest in your product.

--

--

Qian Yu

UX Lead for Webex Assistant, a multi-modal conversational AI at Cisco Collaboration. 👨🏻‍💻 www.thousandworks.com