The case against voice interactions
When humans are talking, we are using a lot more than just voice to get a message across — facial expressions, posture, gestures, drawings, references to a shared context in the past or present.
All of these non-verbal communications supporting speech and making conversations efficient. Given one have a way to establish context a short noise can carry a lot of information. In our culture, we enjoy these moments when a short sentence with a gesture suddenly able to communicate a more in-depth story than a page of text. As humans, we enjoy others who are getting us and cherish them.
We will love computers that can do the same.
Pure voice interactions
A computer with a voice only interface is severely limited and probably never be a good personal assistant. A voice converted to text, stripped out of any emotions, lacking context is not a good communication tool.
Interaction this way leads to mental overload, when a human had no choice but be verbose and keep in mind the context for a machine.
What worse it makes users feel like a voice interface is slow. Or that a computer is not very capable. Both are not true. The “pure” voice interface is just not the way humans and computers, or anyone really, should interact.
Voice interactions in a context
The rich context is already available to a computer. It knows what user is working on (mine knows I am working on this article) and exactly where is user focus is (the cursor was at “focus” word when I was typing it).
So why not use the rich context available to be helpful in responding to a short voice command?
Why not give user simple ability to say —”Define” while cursor on a word to check its definition? Or —“Suggest synonym”? Or — “Capitalize”?
Or—”Reply” when I am looking at a chat notification on my phone or a watch?
Why are we keeping the rich context out of the conversations with our computers?
Convergence
In my other two short articles — Just say it and The case for the voice interactions — I am arguing that technology is ready and that Apple and Microsoft both positioned perfectly well to make our interactions with the computers significantly more natural and personal.
Maybe re-invention of contextual voice interactions is the next logical step in the evolution of human-computer interactions.