The State of Voice UI, SxSW 2016 Edition

Andrea F Hill
disruption at readytalk
5 min readMar 15, 2016

One of the hottest UI topics at this year’s SxSW Interactive festival was voice control. Although smartphone users had started using this means of interaction with their Siri, Google Now or Cortana, the release of Amazon Echo’s Alexa into the world dramatically raised the volume of interest in voice interfaces. (see what I did there?)

The promise of voice interaction is that it feels ‘natural’, and allows us to multi-task as we don’t need to use a tactile input device (and usually gaze at a feedback device). With all the excitement about this means of interaction, will we get to a point where keyboards are outdated and we can interact with a computer as easily as with another human being?

We can look at technical, situational and social factors to consider the impact and potential proliferation of voice interfaces in the future.

Input, Recognition and Interpretation

From a technical standpoint we’re certainly not there… yet. In the session “HAL to Her: Humanizing Tech Via the Power of Voice,” the speakers touched on some of the current challenges related to speech recognition and processing. Some of these are challenges that the “Big Five” players are working on, related to acoustics and noise cancellation. (The “Big Five” are Google with Google Now, Apple with Siri, Microsoft with Cortana, Amazon with Alexa and Nuance). The general sentiment from the panelists (who hailed from Google, Microsoft and the startup Orion Labs) was that it’s unlikely a startup will break ground and develop their own speech recognition to compete against these players. Rather, they’ll leverage these platforms in their products.

Even as the technology of capturing and recognizing audio input is improving, the tasks of interpretation and then actually *doing something* with that input is much more in its infancy.

This is where the fun comes in, and the conversations spiral into the concepts of artificial intelligence and machine learning.

It was fascinating to see this happen time and again over the voice sessions: You start talking about voice and you end up discussing personalization and virtual assistants. There is something about this type of interaction that encourages us to form more of a relationship with the device with which we’re engaging. And Lesley Carmichael (Principal Product Manager at Microsoft) noted that people feel even more strongly about the relationship once some sort of voice response is also included in the experience.

But I suppose this is leading us into the social factors that may support or quell the rise of the voice interface, so I’ll stop here for now.

The Impact of Place

In 2016, we see the rise of voice interfaces in three places: on our phones, in our cars and in the privacy of our own homes. To understand if we will see the transition to voice interfaces elsewhere in our daily lives, we must understand the reasons that gave rise to the new interaction style in these situations.

Hello, Siri

In the session “Testing your (artificial) intelligence”, speakers from SRI and wit.ai discussed why the voice interface was a great fit for the smartphone market when it was introduced to iPhone users back in 2011. Using voice for control was a much less painful experience than trying to type on a small screen that was likely not optimized for mobile. It’s also great while doing other activities that require your hands: driving or other practical purposes.

Etiquette and Privacy

Voice interfaces offer a trade-off in terms of privacy, however. Although just days ago Capital One announced that you can manage your finances via voice control with Alexa, that’s probably something you’re only going to do securely at home. Even looking around the conference room, many of us were jotting away notes on our phones or laptop keyboards. I couldn’t imagine someone trying to dictate their own notes while in the room!

Consumer vs Practical Applications

Interfaces are fascinating when considered from the consumer vs industrial adoption standpoint. Google Glass is considered a commercial failure to a large extent because there was a strong negative social response to wearing the hardware. However, the benefits of a head-mounted display are much clearer in an industrial setting, and the social pressures did not hinder the adoption of such systems. Voice dictation systems have long been used in the medical and legal professions as the benefits are obvious. Note that in both these cases, the interface was associated with a specific system with a known language/grammar.

Interfacing with What? and Why?

The other facet of whether or not voice is an appropriate interface is the type of information being communicated. As Tan Le opined in the session “Beyond the Screen: Interface Pioneers on Future UI,” you would never want to give or receive driving directions purely via speech. Voice is a fantastic medium for communicating certain information, but it is not always sufficient. We must not blindly assume that a newer interface is always going to be better.

Another common theme was that the age of the app (with its graphical interface) is over. We want information and don’t want to have to waste energy finding the right app to get to the information we want. The voice interface is commonly seen as a way to cut through the clutter of visual interfaces (navigation, information hierarchy) to directly access the data that is of interest to us.

So (why) am I excited about voice interfaces?

While the above may read as skepticism of voice interfaces, that’s because I haven’t yet dug into where I see the real opportunity. That is with leveraging natural language to interact with a virtual assistant that can ‘do our work for us’ (the definition of a robot by the speakers during the session “One robot doesn’t fit all”).

One common concern voiced by multiple speakers is that we are collectively suffering from too much cognitive load, and any new interfaces or systems much help alleviate, not add to, this load. Or as (I believe it was) Marcus Weller put more succinctly during “Beyond the Screen: Interface Pioneers on FutureUI, we should design systems that:

“Augment the things you love and automate the things you hate.”

Voice interfaces right now are primarily just interfaces with services. We issue specific commands that return responses.

But machine learning and neural networks offer a lot of hope that we can develop systems that can do more than pattern recognition and rules-matching. A personalized virtual assistant can learn from you and become optimized for your needs. In the hyperbolically named “Will AI Augment or Destroy Humanity?”, Dag Kittlaus (founder of Siri) shared his vision for his new project. Viv is ‘The Global Brain, that radically simplifies the world by providing an intelligent interface to everything.’

Whereas Alexa has no memory and performs a discreet transaction every time you query her, Viv will optimize herself personally for you as the user, and also contribute her learnings back to the network. This is how the system can scale exponentially fast.

Or so we will see.. Viv is not yet available to the general public.

We probably won’t each have a personal Rosie in the next 2–3 years, but we will each end up with personalized voice-activated assistants. Speech recognition is improving. Machine learning and neural networks are helping computers pick up and apply concepts much faster than we thought possible. Voice interfaces won’t completely supplant other means of human-computer interaction, but they will serve as the foundation for our interaction with assistants that can automate menial or trivial tasks and improve challenging or rewarding tasks. While these assistants will be personal (and personalized), there will be a broader network that they connect to, such that we can more rapidly benefit from shared knowledge and experience.

You ‘heard’ it here first, folks.

--

--

Andrea F Hill
disruption at readytalk

Director with the BC Public Service Digital Investment Office, former web dev & product person. 🔎 Lifelong learner. Unapologetic introvert