8 things you didn’t know about Voice User Interface

7 min readDec 27, 2019

Google Home on a kitchen rack beside jars with pasta

With voice assistants getting more and more popular each year, we start considering them regular everyday companions. According to eMarketer.com, it’s estimated that 111.8 million people in the US alone will use a voice activated device at least once a month, and the number is growing. For many of us, talking to an assistant, or, putting it more technically, interacting with a Voice User Interface (VUI), has become a natural solution to handle various tasks.

Apart from the unavoidable occasional, slightly awkward misunderstandings, when talking to a voice assistant we get the impression that we’re talking to an intelligent being who can hear us, understand our needs and fulfill our requests. I would even go so far as to say, companions such as Alexa or Siri can become like a friend or family member to some. This may be partly due to our natural tendency to anthropomorphize entities around us but could also be explained with John Searle’s Chinese Room argument, which is cleverly referred to in Cathy Pearle’s book on VUI.

It’s our job as designers to make interactions with voice assistants seem natural, smooth and human-like. Obviously, nowadays we achieve quite impressive results with that — just take a look at some stats about user satisfaction in this matter. But that doesn’t mean that the assistant behind the VUI actually is intelligent or does any thinking at all. In fact, it usually doesn’t even have to — which is just an example to show you that in order to design a good VUI and have a successful voice control product, you should be aware of a few important things:

1. Most likely, the voice assistant doesn’t actually “understand” what you’ve just said

Well, at least not entirely. The speech recognition mechanism often uses the so-called N-best lists. When you say a voice command to your assistant, it transcribes your words into phrases to interpret them, just as if you were dictating something. But what actually happens is that It doesn’t just “write down” one version of what was recorded, but several versions of it. The versions of what you said, are then being ordered by likelihood and assigned a confidence score. The one with the highest score wins and you may feel understood even if the backstage transcript confused some words. So, as you can see, it’s more or less like the Probability theory in action.

2. The voice assistant may do what you ask even though it didn’t understand you

Depending on the confidence score of the best matched option, the voice assistant can act in different ways:

It may simply respond to you and fulfill your request;
Repeat what you said and ask for a clarification;
Admit that it didn’t understand you at all (meaning: was not able to match the phrase to anything in the database it operated on) and ask you to repeat what you’ve just said.

So, as you can see, the way you handle the N-best lists may have a great impact on how the assistant will behave.

3. You may feel heard and empathized with by your voice assistant even though it has a limited set of responses available

If you ever designed a VUI you know that predicting every possible phrase a user might say to the voice assistant is virtually impossible. Attempting to do so requires an immense amount of work and is thus very time-consuming. There is a way to address this problem.

You can do it by designing not only for words and phrases, but for clusters of categories with words likely to appear in relation to an interaction. This way, your assistant just needs a limited set of responses, where each response refers to a proper category and this way seems appropriate and logical to what has just been said, no matter what words were used exactly.

Let’s say the assistant asks you how your holiday was, it just needs to know how to react to an array of responses categorized as “good” or “bad”. No matter if you put it as “It went great!” or “Fabulous!” or even “Quite ok”, the response will be the same, as it all falls into the same category — “good”. The user won’t notice and the assistant will seem skilled at small talk.

4. To make the VUI “understand” you better, you should not be too specific

When you talk to a voice assistant and the worst case scenario happens: it says it didn’t understand you, your first instinct might be to repeat the key word of your statement, perhaps a little louder. At least that’s what you’d normally do in a regular conversation with a human being. The issue with VUIs is that it doesn’t really need you to speak louder — unless of course you were too far away or there was some interference, but this isn’t the case we’re focusing on here. The truth is the VUI often handles short phrases quite poorly, because it means there is less data to compare with what’s in the database. So it’s better to say “Yes, I do” instead of just “Yes”, to give the VUI more to chew on. With this in mind, it might sometimes be a good idea to design the dialogue inducing longer responses from the user.

5. The conversation with a voice assistant may seem strange to you, because it’s not actually a conversation

Not all VUIs are conversational. Well then, how come we are having conversations with them? When we want our voice assistant to help us with a particular task (e.g. order a pizza) it is usually enough for the VUI to follow a set path of questions to get answers necessary to perform a given task. It’s not the same as a true conversation, in which your interlocutor remembers what you said earlier and can refer to it without trouble, even if, for instance, you decide to use some pronouns not to repeat yourself all the time with the subject of what you’re talking about. You may not believe it, but this is something that is quite challenging to the VUI. That’s why some of your conversations may seem a little strange. But even though they do, the voice assistant can still help you with simple everyday tasks.

6. You’re more likely to give in to a voice assistant with an avatar

Even though an avatar accompanying voice is not crucial to have a fruitful user interaction with a VUI, some apps do use visual representation to enhance the experience. When done right, an avatar combined with voice has the power to inspire more engagement from the user, because it conveys the VUI’s personality in a stronger manner. Users are more likely to connect and empathize with the voice assistant because they tend to anthropomorphize it. So if you’re looking to influence your users, try to change their views or convince them to assimilate some beneficial changes in behaviour, you might want to give it a go and combine the voice assistant with an avatar. Use it to do good, not evil, though.

7. Testing a VUI design can be much like a theatre rehearsal…

Testing a VUI is full of challenges — it’s actually much more difficult to prototype voice in the way that doesn’t seem fake without doing some programming at the same time. The problem is that saying something to a mocked-up VUI can’t cause a reaction on a prototype, just like clicking a button would do in a graphical interface prototype.

So to test a VUI well, you need to apply some practices other that for standard interface testing. One of the simplest, least costly and time-consuming ways is playing out a dialogue designed for the VUI with a colleague, where one of you plays the role of the voice assistant and the other, of the user.

When you write out all possible paths a conversation can go, it’s not only worth reading it to yourself but also to actually speak it out loud with another person to check what is sounds like, just as if you were having a rehearsal before a theatre play. By doing that, you can realize which parts of the dialogue seem natural and which don’t. You can also verify if the verbiage used by your colleague is included in the possible responses set.

8. …and also like the Wizard of Oz

When your VUI design is a little more advanced, you might want to try to test it with users by presenting them with a working prototype. You then try to be like the Wizard of Oz from the children’s story — someone who can do some magic even though he’s not a real wizard but just a regular guy. Instead of waiting for the VUI to be fully implemented, you place a helper behind the scenes, who will (just like the Wizard), press the proper button to play a recording with the voice assistant’s response adequate to what the user has just said. This way you get the chance to see the user reaction to the product you’re designing in an almost real context.

Designing a Voice User Interface — takeaways

As you can see, the VUI often makes a poker face… or rather a poker voice. And this is just the tip of the iceberg of surprises you can stumble upon when designing VUIs. It’s obviously good to know the intricacies associated with VUIs and to use them best for the benefit of a great user experience.

Remember it’s always a good idea to:

Keep in mind how technology behind the VUI works;
Aim for a natural, human-like outcome;
Come up with an efficient way to test your VUI design before releasing it.

Do you have any stories about VUI design? Or are you thinking of building one? Let me know!

This post has been inspired by the book “Designing Voice User Interfaces” by Cathy Pearl.