Voice Technology needs Voice Psychology

“Ok, Google”, “Alexa”, “Hey, Siri”

Voice Interactions between people and technology are becoming more and more commonplace in this era of ‘smart’ devices. Voice Assistants can turn things off & on for you, they can wake you up, remind you that you have an appointment later today, read to you a book, even recap today’s news, and they are capable of many more simple, one-off tasks.

While this is definitely a step forward from Interactive Voice Response Systems and a step closer to having our own Protocol Droid, voice-driven interfaces of today are often clunky and frustrating experiences for us.

The technology behind Voice will only continue to improve, but its user experience has some catching up to do. If not, then how can we ever expect to have actual conversations and maybe even build relationships with our technology.

Her (2013) — A Love Story Between a Man and his A.I. Assistant

Luckily, the act of speaking, the whole premise behind Voice Technologies, is something we as a species have plenty of expertise in to fall back on when designing Voice Experiences. Each and everyone of us are Voice Experts. We have evolved to automatically recognize social aspects from speech and this is why Voice Experiences of today always seem to be a bit off. We can just tell if something doesn’t sound right.

So as designers who create Voice User Interfaces (VUIs), we need to understand that VUIs are intrinsically social by nature. Our users will respond socially to our VUIs. It’s only natural. It’s their biological response to speech, regardless if the Voice is coming from a person or a machine. Knowing that, designers can tap into insights that have come from the social sciences to create a pleasant and effective way to interact with technology that is driven by Voice.

The psychology of speech interface is very similar to the psychology of human speech

Then the first thing that probably comes to mind when designing Voice Experiences is to figure out what the Voice’s Personality will be.

How should the Voice sound?

Should it be funny, commanding, acquiescent? The list of personality traits to choose from goes on and on.

But let’s not get ahead of ourselves.

Just like with any other design problem, we first need to consider the people who our VUIs will be talking to before deciding how the Voice should sound. It’s their personalities that will ultimately decide what the personality of our VUI will be. This is when we can turn to the social sciences for a little help.

Similarity-Attraction

The first insight that we’ll be using from the field of social sciences is the principle of Similarity-Attraction.

Similarity-Attraction is the fact that people are attracted to others who are similar to themselves. Since personality can be determined by someone’s Voice, we can create a Voice that reflects their personality.

For example, a VUI with an extroverted voice will appeal more to those who are extroverted themselves and the same goes for introverted voices and introverts. Now, a personality consists of many other attributes, but determining whether the VUI should lean more extroverted or introverted is a good jumping off point.

(Note that when designing for people with a wide range of personalities or unknown personalities, it is safe to lean towards having an extroverted voice because extroverted voices tend to be more expressive. In an biological sense, being more expressive means to be predictable. Being predictable lessens the cognitive strain on a human brain and makes a person more comfortable with the speaker.)

So what makes a Voice extroverted or introverted?

A Voice is essentially made up of 4 vocal markers: volume, pitch, pitch range, and speed rate. Simply tweaking these 4 vocal markers can make a Voice more extroverted or more introverted as shown below.

Orienting a Voice to be more similar to our users can be used to guide their feelings and behaviors towards our VUI. Designers can leverage this to increase likability, trust, efficiency, learning, and even buying behavior.

Consistency-Attraction

We’ve talked about how to establish how the Voice should sound, now let’s talk about what the Voice should say. For this, we’ll lean on the learnings made from the sociological principle of Consistency-Attraction.

Sounds and words are intimately linked. Inconsistency in what’s said and how it’s said leads to negative affect, dislike, and confusion for the VUI.

In the ideal world, designers would get to design the personality/feel of the VUI and the content it speaks on to adhere to the principle of Consistency-Attraction. In the real world though, the textual content that is produced sometimes has to follow certain guidelines, use specific verbiage, or be consistent with the brand. That textual content can be inconsistent with the VUI’s vocal personality that we have designed to be similar to the people who will be conversing with it.

To reconcile this, we will have to look again at the people who will be speaking to our VUI. Certain personality types (extrovert or introvert) have a strong preference if they want a Voice that is similar to them or a Voice that is consistent in both vocal tone and word choice.

Extroverts, the more social of the two personality types, tend to be more concerned with the relationship between themselves and others. Hence, we can speculate that extroverts would prefer a VUI that is similar to them.

Introverts, those that are inner-oriented, desire the comfort and predictability of consistency. Our VUI’s vocal characteristics can then be tailored to match the provided textual content.

Other personality differences in our users will also guide the tradeoff between consistency and similarity, but this is just another good jumping off point when designing our VUIs.


The subtle nuances of speech that we take for granted is something that shouldn’t be overlooked when designing Voice Interfaces. It’s these fine details that really bring to life our VUIs. As designers who are looking to craft an experience that is both pleasant and effective, we should consider the