Lessons learned designing voice interfaces with visual feedback

Ospoke
Ospoke
Jul 28, 2017 · 2 min read

Voice interfaces are hitting the mainstream and tech companies are tripping over themselves to make sure they don’t miss the boat. Back in the 90’s when early voice recognition tech reached the mainstream there was a myopic focus on keeping the user experience the same regardless of the medium, so a web form worked like a paper form, and a voice form worked like a web form. Remember the frustrating interactive voice response systems that had to be navigated to reach a real human in a contact centre?

But the new wave of speech recognition and natural language tech has brought with it a realisation that great user experience means fitting the interface to suit the user (not vice versa…). Despite the differing hardware, what do Siri on your iPhone and Alexa on your Echo have in common? Fun Easter eggs? Well yes… But more interestingly they pass off a lot of interaction to visual feedback.

According to Amazon “…the light ring is how Amazon Echo visually communicates its status to you…”. It is a subtle but vital part of the experience. A directional glow tells you that it is listening to you, much in the same way as a colleague might turn there head to acknowledge that, yes, they are listening (having been interrupted for the 100th time that day).

The importance of visual feedback to audio communication is underscored by Amazon’s release of the Echo Show. It might be useful for playing videos, but for the user it also supplements area’s where voice interfaces are at their weakest: detailing lists and serving complex information.

There are plenty of guides out there about designing interfaces for voice but here is what our experience of voice-enabling enterprise apps has taught us about developing for voice input with visual feedback:

Show what the speech recognizer thought the user said

In a perfect world voice recognition would always get the correct answer. Although recognition is reaching incredibly high accuracy, it still gets things wrong. If the user is informed of this then they will work-around by rephrasing — without getting trapped in the “I’m sorry I didn’t get that loop”.

Try to keep everything on the screen at once

Scrolling is a pain and unless you want to force the user to memorise what just fell off the top of the screen then stick to static screens. Collapsible sections or multiple pages are your two ways around this.

Try to not have too many things on the screen

It’s just confusing… I mean do you really want me to talk and read an essay at the same time?

Make the change in state obvious

If you’re filling in text, change the background colour. Highlight what you want the user to focus on.

Make things easy to target

No one wants to read out a sentence to target something so if you allow users to target elements by numbers/tags, make those numbers/tags visible and easy to pronounce.

If you have any more idea’s we are always keen to hear them so do drop us an email.

Ospoke

Written by

We voice enable your workforce, so they can focus on what they do best. http://ospoke.com

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade