Voice is the New UI (and it Matters for Immersive Tech)

Liv Erickson
The Matrix is my Office
4 min readJun 15, 2017

While at first glance, AI seems largely separate from virtual and augmented reality, the convergence of the two is rapidly approaching. During Day 1 of VivaTech today, artificial intelligence was center stage, featured heavily in the keynote and throughout panels and demo stations alike.

There is no doubt that voice as a platform is evolving, and while it has yet to fully take over in immersive experiences, I suspect that we’ll see this area grow significantly in the coming years as we start to develop a better understanding and expectation of voice interfaces across verticals in the tech industry. Today’s VivaTech panel ‘Voice is the new UI’ featured Yann Lechelle of Snips and Alex Lebrun of Facebook’s AI Research group, and highlighted the state of voice interactivity and the future of verbal interfaces in devices and software platforms in a conversation moderated by Dimitri Carbonnelle of Livosphere.

“Voice is the most natural and the quickest way for human beings to transmit complex information” — Dimitri Carbonnelle

While voice controlled digital assistants on mobile were made popular with Siri (and science fiction embodiments in movies like Her), the explosive growth of verbal interactivity with devices in the home were popularized by devices like Amazon’s Echo. It’s only been relatively recently that screenless communication has become prominent in today’s tech, but voice communication is a commoditized technology now — available in many smart phones, standalone devices, IOT appliances, and yes — AR and VR headsets. The new challenge is in understanding that input, and using that data and information in a way that users are comfortable with. Privacy and security of voice data is critical, especially with cloud-connected devices (which many smart appliances are) and the biometric nature of one’s spoken communication.

The dynamic of the ‘Voice is the New UI’ panel was an interesting one — Snips is an AI platform for integration in IOT hardware, and Yann spoke frequently about non-cloud devices that have highly specific skill sets to respond to verbal commands of tasks. Alex, on the other hand, spoke more broadly about the work being done at Facebook in the field of AI and voice, which relies heavily on cloud services to understand and process large amounts of voice data for generalized intelligence. The two were in agreement: regardless of where processing is taking place, privacy and user control of voice interactions is critical.

Alexa as an Accomplice?

One particularly important area of privacy and ethical concern surfaced with the question of how information captured from digital assistants may be used in criminal cases. The idea of “privacy by design” and transparency into when devices are listening (not to mention storing) verbal data was a strong part of Yann’s insight into non-visual UI.

“The best UI is no UI, but that’s not always practical”

Another point that surfaced throughout the panel was that voice controlled interfaces frequently need to be contextual, and that successful assistants and platforms will span multiple devices and feature multi-modal inputs. If you’re looking at hotels to book for an upcoming vacation, you’ll likely need a screen to compare — but if you make your final decision that evening while making dinner, your assistant should be able to understand you when you tell it your choice and ask it to finalize the booking.

Looking at the larger picture and upcoming innovation around voice controlled interfaces, it’s unlikely that one company will come out ahead and win over everyone on every platform (though it seems tech giants like Microsoft, Google, Amazon, IBM, and Facebook are trying). The implications of sharing information between tools, platforms, and technologies then becomes even more crucial, especially with regards to transparency and security of that information.

Ultimately, it’s still early for voice interfaces, but the promise is there. I wrote briefly earlier in the week about exploring natural language process for immersive tech, so the panel was a timely addition to that. When we think about multi-modal input, virtual reality in particular feels like a match made in the cloud — we’re separated from our keyboards and the rest of the world visually, and while various types of camera pass through and virtual keyboards provide valuable tools for devices today, our in-world capabilities will evolve alongside as NLP and voice processing continues to become cheaper and more widely available.

One of the main considerations of voice integration into devices (and VR experiences) is the overhead cost of listening to conversations in an ongoing loop to detect keywords and know when to respond — but this is something that is being considered at a feature level for some immersive devices (such as the Cortana integration into Windows 10), which will continue to make it easier for developers and creators to include voice in their applications.

Do you think that verbal input will grow in immersive experiences, or know of VR experiences that currently implement solutions around voice control? Are you comfortable talking to your devices? Let me know in the comments — I’d love to hear your thoughts!

--

--

Liv Erickson
The Matrix is my Office

D.C -> San Francisco, building immersive and emerging technologies