Zero UI, Alexa and Google Home; the next era of interaction? or a massive hype?

The year is 2017. From CES, the launch of Samsung Bixby, to that viral video of a toddler asking Alexa to play digger digger, voice user interfaces (VUIs) have been hailed as the future of interfaces.

A Place For Voice

Voice works for some use cases: in-car experiences and home automation are prime examples. It’s also great for questions with absolute answers and short, information based transactions.

Image via Rapid API

Take the example of the image above, getting Alexa to tell you the answer would probably take 2–3 seconds maximum. The alternative would be to find your phone, unlock it, open your browser, tap the search bar, and type in “What is hello world in German” wait for the page to load and see the results load; this would take at least double the time for you to just ask.

For this test, a voice-only UI wins, but are these standalone home assistants the solution to our impatient, busy lives? Not really.

When voice only UIs become a problem

Compared to GUIs, voice-only UIs create a massive cognitive load on the user; instead of having all the options in front of them, they are have to remember the options at every given stage of the user flow. Yes, the assistant or system may be able to give the user options at each stage of the process, but will the user be able to retain the information given?

What’s easier, remembering a shopping list told to you over the phone? Or having the list in front of you?

Once a non-binary choice comes into the equation, voice-only UIs fail tragically.

Do you remember of that?

A bit of theory

One of the most influential theories of cognitive psychology is the work of George Miller, stating that 7 is the magic number of working memory capacity. In other words, we can only retain 7 (plus or minus 2) pieces of information at any given time. So does that mean we can remember 5–9 choices? Well, technically yes, but in reality, no. Those seven “memory slots” are not for exclusive use of Alexa. As these new home assistants do not have a screen, the user would probably be doing something else at the same time. Looking at the cobweb on the ceiling? One slot. Thinking about how hungry you are? another slot. Loud neighbours? One more. We also have to include things that run at the back of our minds. Realistically, we don’t have the capacity to retain that many choices, let alone remember the right prompts to continue with your query or action.

Screens + Voice = Win

The irony is that if voice UI wants to become mainstream, it cannot be zero UI (except for simple, linear journeys with absolute answers). It has to be used in conjunction with GUIs to create the optimum experience. We don’t have to look that far to find good examples of this, both Apple’s Siri and the Google Assistant on phones are a mix of voice and graphical interfaces. However, these only use the voice aspect as the input and don’t normally allow for a full voice story.

Adding to the existing G(V)UIs

Apart from just using the GUI to display results, the visuals can also be used to prompt further user input in longer, non-linear stories. A mockup that I’ve recently been working on, was inspired by the little prompt bubbles seen on Facebook’s Chatbots and Google Assistant.

For this particular project, we looked at a journey with multiple branches. We found that displaying the options to the user, reduces the demands on them and allowed them to see where they are in the journey. In addition, it provided them with an option to step back to the previous set of options.

The visual part of the GVUI interface through a journey.

It’s common that first time users approach emerging technology with some doubts. We see many people in our lab completely blank when asked to speak to our assistants. Having the reassurance that they can go back and see their options made them feel more at ease. Furthermore, when given the blank slate of “ask me anything”, users often freeze and don’t know where to start — giving them a starting point and some direction is key.

when given the blank slate of “ask me anything”, users often freeze and don’t know where to start.

Artificial Intelligence cannot really understand us (at the moment)

As much as Sci-Fi and glossy product launches would like us to think that artificial intelligence powered interfaces can answer all of our questions, the reality is we are still quite far off from machines completely understanding what humans mean in context. For example, a sustained conversation with Siri is still not possible beyond a few interactions. In other words, most automated conversational UIs require huge UX effort to guide the user down the path of possibility.

Artificial Intelligence is not at the point where it can truly understand us.

Bottom Line

Voice is faster and more intuitive when the interaction between the user and the system is linear, short and binary, such as turning on the lights or asking for currency exchanges. However, once the journey branches out or if the interaction has many steps, a voice-only UI will not suffice, even when machine learning is used to predict user intent.

A Graphical + Voice User Interface (GVUI), with some careful UX crafting will allow users to go through more complex journeys, by prompting them and allowing them to see where they are in the journey. We need to take a step back from the Zero UI hype, and go back to our roots of GVUIs, the stomping ground of Siri and the Google Assistant on phones.