Without saying another word, I can safely assume you read the title of this piece knowing full well that I am looking for ice cream and need a little help from my Apple friend, Siri.
This is because you understand why it is being said and whom it is being said to. But rewind it back just a few years and I’m sure you would instead be asking, “Who is Siri? And why does she know about ice cream?”
That goes to show the recent explosion of voice-user interface (VUI), namely household products like Google’s Alexa, Apple’s Siri, or Amazon’s Echo. Speaking to an object is no longer a product of insanity, but rather functionality.
Many uphold VUI as the future of design, with its hands-free convenience and limits being challenged every day. But with the growth of VUI comes the inevitable challenge of reimagining the conventions of Information Architecture (IA). The heuristics of IA takes into account many variables from usability, to learnability, to credibility. How does one create clear and retainable mental models for users to navigate content and accomplish tasks apart from a graphic-user interface (GUI)? What is the most effective way to handle error management? These kinds of questions only scratch the surface of the VUI systems process. And as designers continue to tackle these problems, the growing visibility of VUI in the mainstream market brings into question one particular principle of effective IA: accessibility.
To paint a better picture, I’ll start with a story about my grandparents.
Growing up, I spent my days after school at my grandma and grandpa’s home until my parents swung by after work. That meant a lot of extra time spent with my grandparents, two Hmong refugees who fled to the United States after the Vietnam War. On arrival to the country, they did their best to assimilate and make ends meet, from odd jobs to running their own Chinese restaurant in the middle of a small, 95% white rural town. I remember the challenge for them both in getting around and getting things done — rarely due to intelligence, often due to a language barrier. I’d see them have trouble expressing and being received in public spaces and therefore affecting how quickly and accurately their needs would be met.
This is what comes to mind when I think about the expansion of VUI (and strictly voice-only products). How would such an interface cater to the needs of my grandparents, two individuals who experienced linguistic challenges with other human beings (let alone a machine).
This includes not only immigrants who must tackle language barriers, but even those who speak with regional accents or anything apart from the General American (GA) dialect for that matter. Some products may justify a limited voice recognition feature due to their target audience being users of mainly the GA dialect, but if in fact there comes a day in which VUI products need to become more broad in their target audience, what would it look like to more deeply consider the needs of diverse dialects? GUIs offer more buffer for those with varied linguistic expression but VUI, from my own experience, even as a speaker of the GA dialect, is much more unforgivable with a much higher potential for error (think long hours on the phone with mechanic voices, binary options, and endless loops of repetition).
VUIs such as Alexa and Siri have done a satisfactory job in providing comprehensive voice recognition features for users but as implementation becomes more widespread, VUI system design must continue to consider more deeply the complexity of human communication and develop its human-centric design for a more expansive audience, one that even my grandma and grandpa can be a part of.