The Promises and Pitfalls of Voice as a UI

Sara speaking with Josh

Between cross-platform politics (Apple vs. Amazon vs. Google) and the fact that voice control in consumer electronics is still nascent, the promise of a truly automated home is still sadly unrealized for the most part. However, there are plenty of easter eggs you can ask your Echo to entertain you with while you wait for Jeff Bezos to decide to play nice with Apple and Google (spoiler: he probably won’t).

I think it’s neat that you can turn on pretty much anything throughout a house just by saying so out loud. I also think that voice should and will be the de facto method of interaction in the household by the end of this decade. In that regard, I have to thank Apple, Google, and Amazon for advancing the state of responsive AI. But there’s still a lot of work to do.

There currently exists no solution on either the Amazon Echo or Apple’s Homekit platform for easily showing you a live feed of your front door Nestcam. Not only that, but if you tried to string together a compound command (do this + do that + then do this) you’d be met with a sterile and deflective statement of not understanding your request. At we’re creating a user experience that remedies this.

Sara demonstrates Josh’s ability to process complex compound commands.
The dream of an empathetic smart home is being hampered by brand politics.

Having spent almost every day of the past year obsessively researching, evaluating, and working on home automation, I’d like to propose three steps toward finishing the (smart) house that science fiction envisioned.

1. Work Together; A Rising Tide Lifts All Boats

Naysayers and doubters of the home automation movement like to point to the patchwork of interoperability between most popular products. They claim that adoption rates are low and will continue to stay low as a result of a lack of products and services that work together cohesively. As anything and everything in the home quickly becomes “connected,” expect major brands to realize that the only way to saturate the market is to put aside their pride and “walled garden” philosophy and to offer products that enhance the user experience by working with other popular and already existing products.

2. Focus on the Context

If you’re developing an AI for home automation, you can waste a lot of time hard programming a number of witty responses, or you can double down and hone your AI to be really good at doing a particular job. In this case, it should be really good at understanding user intent in the context of household actions. More precisely, give the user an easy and enjoyable way to control the devices in their home.

3. Transition Users Slowly

Minesweeper was more than just a game to kill time. It was a way to train the user to interact with a graphical user interface (photo credit).

Just like when the first iPhone came out, or when the first user interface was adapted for desktop computers, the role of design in digital consumer products needs to help users adjust to their new operating environment. In the case of Windows, Solitaire and Minesweeper were actually UX trojan horses designed to accustom users to the act of clicking their mouse and dragging items across the screen — something not possible with previous command line terminals. And when the first iPhone came out, skeuomorphism was heavily used (and abused) as a visual metaphor to transition people used to pressing physical buttons to tapping virtual ones.

In the race for “zero UI” and the perfect conversational user interface, we’ve neglected the importance of a robust visual interface.

Here we are now, on the cusp of yet another UX paradigm shift. Except this time, there’s little to no subtlety in how major players in the field are handling the transition from the old model of interaction (touch) to the new (voice). Voice dictation is not the be-all and end-all. Voice is yet another layer of interaction atop sight and feel. Arguably, the next level is thought, but that’s another post for another day.

Just like how we all learned to accept tap interfaces and swipe gestures, users need a metaphor to help guide them through the transition period between GUI’s and CUI’s (conversational user interfaces). Because speaking your mind is so much easier than tapping buttons that trigger a preset action, it’s easy for users to expect the world of the device/computer/AI they’re interacting with. Speaking to a machine, there’s a false expectation that because it’s supposedly smart enough to understand natural language, that means it’s smart enough to understand complex human thoughts or carry out compound commands. This isn’t always the case, and one way to keep expectations in check is by designing the user experience around familiar call-to-actions and constructing boundaries around what a system can do. This ties back into the previous point of focusing on context. If you inform your users about the intended purpose of what your AI is capable of doing, they’re less likely to ask it abstract questions about the meaning of life, and more likely to use it for its intended purpose.

It’s helpful to keep a visual interface that allows users to reinforce what they’re saying as well as act as a graceful means of degradation when a command can’t be executed properly.

The Josh iOS Beta app showcasing how the graphical user interface complements voice control.
More importantly, a visual interface should let the user access basic and critical functions as intended.

Voice is the newest and hottest trend in UX design, and it’s quickly becoming a differentiating factor and main selling point for any company that wants to be taken seriously in the tech field. If there’s anything to be learned from the previous two shifts in human-computer interaction, though, it’s that we need more time and familiar UX patterns to help ease users in to their new environment.

Looking Forward to the Real Future

As touched upon in Chris Messina’s article “2016 will be the year of conversational commerce,” the conversational model of interaction is quickly becoming the norm for many platforms, where technological jargon is being replaced with natural language.

Natural language processing plays a huge role in how our product, Josh, works.

We’re excited to get Josh into the homes of tomorrow, and we’re very happy to say that by the end of this year, there will be 100 lucky beta homes that will have it installed. To see if your home is eligible to be one of those 100, take our survey here.

This post was written by Jason, lead designer at Previously, Jason led design for web and mobile apps At The Pool and Yeti. He attended Art Center College of Design in Pasadena, California. Jason loves cold brew coffee, LEGOs, and recreational shooting. You can follow Jason on Instagram at @jas0n_0n_a_bike. is an artificial intelligence agent for your home. If you’re interested in getting early access to the beta, enter your email at

Like Josh on Facebook, follow Josh on Twitter.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.