The Learning Curve of Voice-Enabled Technology

Danie Ferrusi
Nov 27, 2018 · 4 min read

When I was first introduced to Siri I didn’t trust her searching capabilities. Last Christmas, when I received a Google home, I didn’t trust that it would play music from the right station. I didn’t know what either voice-assistant could do, I didn’t know what prompts it would or would not take, I only knew when it was listening to me and the instant rush of anxiety I felt to come up with a question or prompt. It wasn’t until I asked the device how I could use it that I started to feel the benefits of the product.

Complex voice-controlled devices are likely to become a facet of newer common device interaction, but with that, we lose an instinctive know-how provided by traditional visuals.

With the popularity of voice-first assistants like Google Home, Amazon Alexa, Apple’s Siri, and Microsoft’s Cortana, (among others), voice-enabled technology is bringing information architecture to some uncharted paths. However, this type of IA isn’t completely unheard of. Long before these current smart speakers and voice assistants, we used audio recognition devices by way of sound activation through OnStar in cars, speech-to-text on computers or phones, even those sound activated lights “The Clapper” (more widely known as Clap on Clap off lights, which became popular in the early 2000’s). The information architecture is old but the complexities surrounding newer device capabilities are certainly increasing. There are stark differences between the information architecture of a light that switches on and off via sound VS an intelligent conversational device that can record your shopping list, tell you the weather, call a friend, search the web, and execute more elaborate tasks. The biggest mystery is how we get users from the instinctive know-how and hierarchy that visuals provide, to the ease and comfort of voice-only prompts.

Conversational cues may be the natural answer to building confidence surrounding voice-first design. Dialogue comes naturally and therefore takes the space of the familiarity that we’re commonly reliant on in a point-and-click environment. In a visual environment, a lot of information can be displayed at one time, allowing the user’s discretion and the help of aesthetic design to move them forward. However, in a voice-only environment, a small amount of information is vocalized, and users must prompt the device to move them through each section until they reach their goal point. Voice is more aligned with input whereas visual is more aligned with output. Designing voice experiences must prioritize simplicity and adapt current IA models to fit that modality. Usability and findability exist at the very foundation of a reliable voice experience and should be constructed with mindful intention when considering the design of voice-first IA. The less helpful a device (i.e. it’s inability to answer questions, fulfill user demands, or express what an end user can do) the less likely a user is to rely on the product. When confronting usability the information architecture needs to reflect a user who may have no experience navigating in a voice-only environment. When confronting findability the information architecture should allow users to move through prompts as a means of orienting themselves. How will a user know where they are or how something is to be accomplished if they have no visual cues? Sorting context within a nonvisual format must ultimately come from user prompts and their ability to recognize where they are, where they can go, and their desired destination with the device aiding the user in that journey. Dialogue flows need to play an integral part in successful voice experiences. The best voice-only experience will be natural and conversational because this is an intrinsic ability. Ultimately, this is how we build confidence and comfort with users. Although the information architecture may seem different (i.e. harnessing auditory cues and automatic speech recognition) it doesn’t mean it can’t be adapted from already successful models.

Users want to plug and play immediately; you can call it the age of instant gratification but having to learn what a device can and can’t do, without it being an innate part of our natural behavior, is asking a lot from a time conscience consumer. I use my Google Home every day but for smaller prompts like playing music, updating lists, getting the weather, and I rely on my screen-first devices for more complex tasks. Device trust has only ever been built over time to match the respective learning curve necessary for the consumer to gain usage confidence. Moving from screen-first to voice-first isn’t always organic but if a device has the information architecture to simplify (what could be) cumbersome experiences into simple experiences then voice-first will multiply its value in the future.

Data Driven Investor

from confusion to clarity, not insanity

Danie Ferrusi

Written by

Living for technology, UX, design, dogs, beer, music, games, and sarcasm. A nerdy mountain sloth residing in the great state of New York.

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade