The case against voice interactions

Ilya Belikin
2 min readMar 11, 2018

--

When humans are talking, we are using a lot more than just voice to get a message across — facial expressions, posture, gestures, drawings, references to a shared context in the past or present.

All of these non-verbal communications supporting speech and making conversations efficient. Given one have a way to establish context a short noise can carry a lot of information. In our culture, we enjoy these moments when a short sentence with a gesture suddenly able to communicate a more in-depth story than a page of text. As humans, we enjoy others who are getting us and cherish them.

We will love computers that can do the same.

Pure voice interactions

A computer with a voice only interface is severely limited and probably never be a good personal assistant. A voice converted to text, stripped out of any emotions, lacking context is not a good communication tool.

Interaction this way leads to mental overload, when a human had no choice but be verbose and keep in mind the context for a machine.

What worse it makes users feel like a voice interface is slow. Or that a computer is not very capable. Both are not true. The “pure” voice interface is just not the way humans and computers, or anyone really, should interact.

Voice interactions in a context

The rich context is already available to a computer. It knows what user is working on (mine knows I am working on this article) and exactly where is user focus is (the cursor was at “focus” word when I was typing it).

So why not use the rich context available to be helpful in responding to a short voice command?

Why not give user simple ability to say —”Define” while cursor on a word to check its definition? Or —“Suggest synonym”? Or — “Capitalize”?

Or—”Reply” when I am looking at a chat notification on my phone or a watch?

Why are we keeping the rich context out of the conversations with our computers?

Convergence

In my other two short articles — Just say it and The case for the voice interactions — I am arguing that technology is ready and that Apple and Microsoft both positioned perfectly well to make our interactions with the computers significantly more natural and personal.

Maybe re-invention of contextual voice interactions is the next logical step in the evolution of human-computer interactions.

--

--

Ilya Belikin

Founder of Posit network of design practice for good. Hong Kong.