Voice Is Visual
Words By Peter Barber — Conversational Designer, VERSA Agency
Working as a VUI (voice user interface) designer, I’ve lived and breathed voice. I’ve seen the power that voice has, but I’ve also seen the barriers that hinder its growth.
This is an attempt to clear up a number of misconceptions currently held by brands and other VUI designers when it comes, to voice.
Voice Is Visual
The most powerful, functional element of voice — in terms of an interaction method — is the speed that users can give complicated queries and commands to a system. I can’t understate this enough, having the ability to input complicated, contextual data through voice, opens new avenues of experiences that until recently weren’t possible. This being said, voice has an equally great failing that will see voice first design fall to the wayside.
Voice is slow.
The slow responses in voice technology are one of its biggest failings. It’s not that engaging or enjoyable to use by itself, and often times turns into an arduous game of vocal tag. As much as I would like to transfer money purely through voice commands, or manage my Super — users do not feel comfortable when they can’t get instant validation of their actions. There’s a reason why Amazon keeps innovating with screen-based, voice-enabled devices.
One of Neilson’s Key Usability Heuristics is Visibility of system status, something that is inherently lacking with Voice only user interfaces. Visual validation is the conduit needed between user and system, that creates a truly human, voice lead experience.
Ever wonder why there’s not a voice only Navman? Navman comes with a screen as standard because visuals and voice come together to create one of the single best digital experience on offer.
Voice is slow, but visuals are quick. If users are asking their smart assistant what’s coming up in their day, how do you expect them to remember more than three pieces of information? Don’t get me wrong, when I want specific, quick information, voice functions perfectly. Personally, I love asking my Google Assistant questions because it’s just so much faster than physically typing my query in. The latency of voice is one of its biggest failings, usually around 1–3 seconds. Currently, Google searching via voice is the single best use case for voice that’s ever been created, and here’s why:
- Users voice input can accurately be captured and validated on the screen in real time.
- The information you query is immediately shown to the user, through the screen of their device and through a voice response.
By using our voices to input data, we should receive a response that’s equally fast. Anything else but instantaneous, you damage one of the core strengths of voice. Don’t get me wrong, voice first design is great, and ideally, all interactions should be designed this way. Users should be able to interact with a system frictionlessly and be able to navigate it with ease. Currently, however, VUI experiences that don’t have a screen can carry an ultimately uninspiring user experience.
So is voice going to be the next interface of the future? At SXSW this year we heard that it was, but why exactly is this true?
Because my friends — Conversational design is shaping digital experiences everywhere.
What is Conversational Design?
Conversational design is the design thinking behind truly frictionless user interfaces. It’s the interface of the future that enables all users, no matter at what point in your life — be able to interact with a unique system and achieve your desired outcome. First, try, with just your voice. There is a key difference between enabling your digital solution to be navigatable via voice and designing your voice experience.
Voice Is Personal
Interfacing with a system that understands the intent is insane. Contextual awareness done well in conversational design is going to revolutionalise the interface method and see it become standardized across all digital touchpoints.
We’ve all heard this term ‘frictionless interface,’ and I guarantee we’ve all designed with one in mind. The ideal interface is one that anyone can use no matter what their previous experience, and achieve their desired outcome in one go. There is nothing more rewarding than using something new and nailing it.
This is where voice shines.
Voice, however, can not shine. This is an example of conversational design, that doesn’t service user needs.
A bad voice experience is glaringly obvious. Let’s take the AFL skill for example. AFL is an honoured Australian, cultural phenomenon. It should be pretty easy to craft a skill that works within the bounds of conversation — (given that’s what half of us talk about anyway).
Now, I by no means am I an avid footy supporter, I’m currently coming 3rd last on my agencies footy tips — but I can identify a number of key failings within the AFL skill. If you’re designing a conversational skill, it should be as natural as talking to a friend.
Conversational design lies at the heart of all frictionless voice experiences. How do you create truly frictionless voice experiences? I’m glad you asked! That sounds like a good topic for next time.
Key takeaways:
Voice is fast, visual is quick. Together they’re lightning.
Beautiful voice design leads to a beautiful experience.