What is the UI for “No UI”?

Fred Graver
4 min readNov 12, 2021

--

An image of audio inside a speech bubble, indicating an “EarCon”.

I’m coming at this with a fair degree of humility. There are many, many people in the Conversational Design and Voice community who have been doing this a lot longer than I. If anything, I’m hoping I add a newcomer’s perspective to our work.

I’m preparing a talk for the MoDev 2021 conference in December and think that this might be a great platform to ask a few questions about our discipline. These are early notes and I’m hoping those of you who spend a few minutes reading it can give me some feedback.

Here’s my big question: Is Voice as an interface stuck in a user plateau because, as a community, we haven’t adopted a set of UI “standards” that would allow users to feel familiar and comfortable with Voice applications and interfaces?

The idea of “EarCons” is not new. And some smart people on Medium have been chatting about this for a couple of years, including Catherine Yochum (who gathered a some wonderful samplings of EarCons here), Aditi Agarwal (who created some great categories / definitions) and Mark C. Webster (who broke down specific use cases).

A Sampling of Earcons, Via Catherine Yochum.

Both the web and mobile got massive adoption when certain conventions and standards were developed. From “You’ve Got Mail” to the envelope icon, from simple navigation arrows to breadcrumb trails, from taps to gestures on mobile… These all helped people feel comfortable and familiar on the internet because 1) The interface said “Don’t worry, I’m just like those other things” and 2) The user didn’t have to re-learn basic functions.

As a community, I think we need to have a recognizable set of conventions that would be the equivalent of the icons you find in Google or Apple’s design systems.

It’s tricky; This is the interface that seemingly has no interface. But creating a universal set of conventions for a Listener Experience would:

- Promote wider adoption of voice.

- Help us integrate fully into the “multimodal” universe.

- Help us advance the use and effectiveness of Artificial Intelligence as it interacts with humans.

We could begin by recognizing that the User in a Voice User Interface is a Listener. They don’t have visual cues. They can easily be distracted. We need a “jump” in our design evolution that’s the equivalent of discovering the power of tapping and gestures when we moved from web into mobile.

Let’s take an abstract example, just for the sake of discussion. We’ll call this site or mobile app “GENERIC.” When you go to the Generic home page or screen, you find certain conventions at work: Hierarchy, Location on the page, indicators that you can scroll or swipe, etc.

If the designers have done their job well, they have accomplished what John Maeda laid out in “The Laws of Simplicity.” The UI says “I know why you’re here and I know what you need to know before you make your next decision.”

Since we don’t have a page or an app, we need to create a new mental model for the listener. Right off the bat, they need to hear something that answers a number of questions:

- Where am I?

- What can I do?

- Who’s guiding me?

- Do you understand why I’m here?

- How do I get around?

- When is this over?

I think that EarCons can solve a lot of that.

- We can create real-world sonic metaphors. Are you in a restaurant? An office? A hospital? A busy city square? In an airport?

- Once you’re “there,” we can add and mix sound to create a sense of space, time and distance. The “cash register” is over there. The “shelves” are over here. The ticket agent is right in front of you. The nurse can take you to the doctor.

- We need to use multiple voices. Getting locked into one voice to do everything doesn’t really help. In real life, no one expects one person to answer all of their questions. Let’s think about “delegating” tasks. It will make it easier for the listener to know what they’re doing right now and “who” they’re doing it with. And it’s easier to ask “Bring Linda back!” than “Return to menu.”

- We can reduce the overall cognitive load by reminding the listener that we are remembering what they’ve done and where they’ve gone. It’s a pain to keep saying “I’ll remember that,” so let’s create an EarCon that says “Saved / checked / Got it / I’ll remember that.”

- Navigation and Discovery are a huge hurdle in a Voice-First interface. I think we need to create something that’s the equivalent of the breadcrumb trail or pull-down menu.

- What is the equivalent of a hyperlink, and when will we find a way to allow users to just barge in when they hear something they want to explore?

- We need to define “opening and closing” doors, file cabinets, whatever. Think of the beeps at a checkout counter. Or filing papers in an office. Or ticks / haptics when you fill out a form online. We need a universal set of sounds that says “this is open now / this needs to be responded to” and “got it / moving on / closed.” What are our “Windows” that open and close as we move through software?

That’s my early thoughts. I’ll continue to work on this and share my thinking as I get closer to finalizing my presentation.

--

--