Human-Computer Interaction: Why Voice Changes Everything

Some of the computers I’ve owned in my lifetime.

Small screens

I am a child of the 90's. I still remember (barely) playing Oregon Trail as a kid, connecting to AOL over dial-up, and experiencing my first truly personal computer as I went off to college. I was late to get a smartphone and resisted texting for as long as possible. But over time my devices have gotten smaller, and the ways in which I interact have changed.

Photo by @thefatjewish

Those who lived through the early 2000's will appreciate the graphic on the left. If you look at the devices above, you’ll notice the screen starts tiny, gets larger, then shrinks again over time. We went from having a keyboard to a touch interface, and everything changed along the way. When it comes to mobile and wearables, we got really creative about how to input commands, albeit still burdensome at times. Virtual keyboards have gotten pretty smart, with suggestive type and swipe gestures. Unfortunately this only scales to a certain extent. Try using a swipe mechanism on a watch or Google Glass and you’ll be out of luck. Small screens and wearables are one reason I am bullish about voice powering the next generation of devices.

Swiping on a virtual keyboard rather than “typing”

Voice for programming

That said, voice isn’t all about text-based input on small screens. Voice offers a whole new paradigm in complex programming for the non-programmer. For example, the product I am working on, Josh.ai, uses voice to power home control and automation. Consider the complexity of the following rule:

At sunrise [to be calculated each day] gently wake me up to classical music [as determined by a service like Pandora], slowly open the blinds, brew a pot of coffee, open the dog door, set the temperature to 75 degrees, and after 10 minutes turn on CNN in the kitchen.

Voice is natural and intuitive (if the system is smart enough to understand it). Voice can handle some pretty detailed and complicated statements and return elegant actions for the user. As we want to do ever more complex tasks that interface between our devices and the real world, voice as a tool for programming can be highly effective.


Simplify the interface

Imagine a single mobile app with a GUI (graphic user interface) capable of the following tasks: set an alarm, send a text, get directions, find movie times, check on the weather, turn on the kitchen lights, call grandma, compose a tweet, etc, etc.

That would be terrible. You would need a button, page, or text-box for each query. Voice, on the other hand, makes it possible to do all of this without a single tap. This simplification of the UI (user interface) is one reason voice is exciting, particularly as we move towards screen-less wearables. Consider a smart home with microphones and speakers — you’ll have full control with voice while you cook, lay in bed, work, and more. I believe voice has the potential to change how we think about and interact with computers.


Next steps

So where do we go from here? Personally I think we still need GUI’s and will for some time, but I’m excited to see the visuals phase out as we welcome voice into our most important products. For Josh.ai, we will continue to build desktop and mobile apps that don’t require voice, but we’ll optimize for vocal inputs along the way. By putting voice front and center we believe it can become a powerful next step in human-computer interaction.


This post was written by Alex at Josh.ai. Previously, Alex was a research scientist for NASA, Sandia National Lab, and the Naval Research Lab. Before that, Alex worked at Fisker Automotive and founded At The Pool and Yeti. Alex has an engineering degree from UCLA, lives in Los Angeles, and likes to tweet about Artificial Intelligence and Design.


Josh.ai is an artificial intelligence agent for your home. If you’re interested in getting early access to the beta, enter your email at https://josh.ai.

Like Josh on Facebook, follow Josh on Twitter.