Exploring Cultural Awareness When Designing Voice UX

Are voice assistants culturally sensitive?

Published in

55 Minutes

12 min readMar 1, 2021

“Please pass the chips!”

When you read that sentence, did you picture “chips” as a crunchy oval-shaped snack? Or did you instead think of a salty side dish sitting alongside some batter-fried fish? If you’re from Singapore or the United States, it was most likely the former, with brands like Lays and Jack ‘n Jill coming to mind. If you’re from the United Kingdom, you probably thought the latter and might now be scoffing and thinking to yourself, “Lays are crisps not chips!” This is just one example of the power of culture and its ability to shape the way we think.

Photos by Sean McClintock and Meelan Bawjee on Unsplash

Both as a psychology major and as someone who was born in Singapore but currently lives in the United States, cultural psychology fascinates me because it’s eye-opening to learn about all the things that make a culture unique. Author of Cultural Psychology, Steven J. Heine defines culture as “any kind of idea, belief, technology, habit, or practice that is acquired through learning from others…existing within some kind of shared context”. So, for a voice assistant to be culturally sensitive would mean that it’s attuned to the specific characteristics with which a certain group of individuals identify. Understanding the cultural context of an individual is key to understanding how to best address their needs. That’s why, when voice is the sole means of communication, understanding the nuances of a culture becomes even more important, especially when the goal is to increase usability, usefulness, and desirability through user experience (UX) design. So sit back, grab a bowl of your favorite potato chips (or crisps), and we’ll explore what elements to consider when designing culturally sensitive voice assistants.

How voice is being used across cultures

First, let’s look at some examples of products that currently use voice to optimize their users’ experience.

India: Vodafone-Idea Phone Line with Google Assistant

“Introducing Vodafone-Idea Phone Line | Story of Irshad and Hussain.” YouTube, Google India. Screenshot by author.

At their Google for India event in 2019, Google launched the Vodafone-Idea Phone Line. Phone Line gives users in India who have little to no internet connection the ability to converse with Google Assistant by simply dialing one phone number. Common questions include asking for the news, information to help with homework, and sports scores (which is especially true when big cricket matches are being played). And drawing on the fact that Hindi is the second most-used Google Assistant language, users also have a choice of conversing in either English or Hindi.

How this product is culturally sensitive: With millions of Indians either lacking internet connection or only having 2G signal, Phone Line provides help to a large portion of the population by making a wide array of information accessible from any location. In addition, Phone Line can be accessed in either English or Hindi. Allowing users to speak in their language of choice makes the product easy to use with time-saving convenience.

China: Tmall Genie

“TMall Genie.” Vimeo, Sean Wang. Screenshot by author.

Tmall Genie is a smart speaker developed by Alibaba Group that uses the intelligent personal assistant, AliGenie, to provide a hands-free way for users to shop online. The voice assistant allows users to purchase items simply by saying a command such as, “Tmall Genie, I want to buy liquid detergent”. Given the ease with which items can be ordered, smart speakers are being used now more than ever for online shopping in China. Alibaba reported that more than 1 million orders were placed through Tmall Genie during their 11.11 Global Shopping Festival. By taking advantage of their consumers’ desire for a fast and voice-controlled method of shopping, Alibaba quickly made Tmall Genie the most-purchased smart speaker brand throughout most of 2019.

How this product is culturally sensitive: Tmall Genie builds off of a culture that has shown to prefer audio messaging over text messaging because it’s quicker and simpler, which are benefits that also arise from the use of voice assistants.

Japan: Community Keijiban

“Start with One: Community Keijiban.” YouTube, Experiments with Google. Screenshot by author.

With a prominent elderly population, residents of a Japanese housing complex came together to develop Community Keijiban (‘Community Board’), a voice-powered notice board, to help members, especially ones who live alone, connect with others by notifying them of upcoming social activities. Using Google Home, Community Keijiban allows individuals, even those without strong technology skills, to easily access the activity schedule and set reminders for future events.

How this product is culturally sensitive: Community Keijiban is designed for ease of use and practicality, which caters to Japan’s large and growing elderly population, a number of whom live alone. Also, the voice of the Japanese-speaking Google Home has a higher pitch than the voices for other languages, which resembles the linguistic style of what’s considered to be a traditional Japanese feminine voice.

Why cultural sensitivity matters

The mutual constitution of culture and selves. This framework, developed by esteemed social psychologists Hazel Rose Markus and Shinobu Kitayama, states that “The sociocultural context shapes the self,” and “In turn, peoples’ thoughts, feelings, and actions (i.e., the self) reinforce, and sometimes change, the sociocultural forms that shape their lives.” This means that there is a cyclical relationship that exists between us and our culture. It’s important to keep this framework in mind because the ideas and behaviors that we learn from others in our culture are the same ideas and behaviors that we will impose upon and expect from interactions with our voice assistants.

Relationships are built on emotional connection and trust. Although it’s easy to think of how these qualities manifest among human relationships, you might not be thinking about how this applies to a relationship between humans and machines. But since interactive voice technology is designed to mimic human interactions, doesn’t it also make sense for a machine to be able to understand your values? Recognize your feelings? Gain your trust?

Think of your closest friends. Chances are, you all have several things in common. We often use common ground to emotionally connect with others, share goals and motivations, and build trust between one another. One of the things that provides us with the means to do so is culture. However, because culture is highly localized, approaching its application to these devices is not a ‘one size fits all’ process. Rather, taking a more tailored approach is what will make your voice voice assistant the most effective for its users by making them feel understood.

Here are five aspects to consider:

Cultures vary by their . . . .

1. Language

Obvious, right? Well, while it may seem simple that there are differences in language, we want to think beyond whether a country speaks English, French, Chinese, etc. We also have to realize that language is made up of slang and intonation.

Slang is specific to a certain group of people and it’s commonly used in speech. Since voice assistants aim to provide convenience by being speech-controlled, developers should be aware that users may use slang in their commands. And, of course, we can’t forget about “chips.” Even in countries that speak the same language, be aware that one word can have multiple meanings.

Intonation is an element of language that isn’t noticeable when it’s correct, but when it’s wrong, it can prove to be a memorable mistake. Because each language has its own sentence melody, hearing your Amazon Alexa emphasize the wrong word in a sentence or use the wrong tone can leave you with a questionable experience. Intonation is also important for conveying emotions such as anger and sarcasm through voice.

2. Perception of emotion

We often think of emotion as being expressed through facial expressions and body language, but a 2017 study out of Yale University reports that voice-only communication increases empathic accuracy. So, when it comes to voice assistants, their ability to express and detect emotions plays a large role in user experience.

However, the extent to which emotion is perceived through voice depends on one’s culture. In a study testing the differences in the multisensory perception of emotion between Japanese and Dutch participants, researchers found that when provided with auditory and visual cues, the Japanese participants gave more weight to vocal cues when assessing emotion. In contrast, the Dutch gave more weight to facial cues. This data suggests that individuals from an East Asian background may rely more heavily on voice to detect emotion than individuals from a Western background.

Given these findings, coupled with those regarding language differences, if a voice assistant isn’t made to understand and use culture-specific intonations and slang, then it’ll be less likely to make empathic connections with its user.

3. Perception of gender

Apple’s Siri, Microsoft’s Cortana, and the Amazon Alexa each use a female voice as their default. Why is this so? Two studies, one by Stanford University and one by the University of Indiana, highlight that society uses stereotypes to guide its preference toward a female voice assistant. Perceived as more caring, helpful, and cordial, female voices are said to provide a “pleasing” experience that is good for business. Contrasting reports from Great Britain show that users were content with a male voice because of their history of having male servants. In addition, BMW received complaints from German customers saying that they didn’t want a woman on their GPS system telling them what to do. Suffice to say, there is no consensus for what gender these voice technologies should have and perhaps, companies will take this opportunity to go with a third option: a genderless voice.

More recent findings suggest that gender doesn’t matter as long as the assistant is providing its user with the necessary support and empathy. In 2019, Q: the First Genderless Voice was created to end gender bias in AI (artificial intelligence) assistants. There still remains some concern about whether companies and audiences will be receptive toward a genderless voice but San Francisco-based company, Wired, states that Q is working toward confronting the culturally programmed ways in which we see gender today. This is especially important since voice assistants can have a negative implication of reinforcing gender biases, and is something to be cognizant about.

4. Self-concept

While there isn’t one established definition of “self-concept”, it’s typically described as being the beliefs and attitudes one holds about themselves. The most common way to distinguish self-concepts across cultures is whether an individual has an independent or an interdependent view of the self. Let’s break this down by first looking at this figure.

Markus and Kitayama (1991) and Heine (2008)

To have an independent self-concept means that you view yourself as unique and distinct from others, including family and friends, and can easily become acquainted with strangers, as depicted above (left) by the dotted line making up the in-group/out-group distinction. The right half of the figure represents those with an interdependent self-concept, which is when the view of the self overlaps with and is heavily grounded in relationships with family and friends. As opposed to the independent self-concept, strangers are not easily let in from the out-group, as represented by a solid line making the distinction between the in- and out-group. Here’s an example of how this difference may manifest when two individuals are asked to describe themselves:

Person A (independent): “My favorite color is red, I like to watch movies, and I’m athletic.”

Person B (interdependent): “I have a brother, I’m a university student, and I’ve been with my partner for two years.”

Independent self-concepts are common in individualistic cultures like the United States, Australia, and the United Kingdom, whereas interdependent self-concepts are common in collectivistic cultures like those found in Asia and South America. Self-concepts are important to consider when assessing how users will adopt and interact with voice assistants. As people who prioritize self-sufficiency, those with an independent self-concept may easily adopt a voice assistant and use it to keep up-to-date and on-track with their personal appointments and events. Those with an interdependent self-concept may be slower to trust and fully receive a voice assistant as a part of their life because new relationships are not as easily formed. On the other hand, because people from collectivistic cultures greatly value the opinion of their loved ones, they could also be more likely to adopt a voice assistant if it is approved by members of their in-group.

5. Perception of class and social hierarchy

And to tie everything together, let’s talk about class and social hierarchy. This aspect of culture involves gender, language, and self-concept because they all contribute toward how we organize ourselves within society.

Power distance is a term used in cultural psychology that concerns the extent to which less powerful people in society accept power differences in organizations, institutions, and families. China, Mexico, and the Philippines are among the countries that measure high in power distance, but Malaysia stands out as the country with the highest power distance index (PDI), with the only score over 100 on a scale from 1 to 120. Since power distance plays a large role in the business world, assistants, which are traditionally viewed as people who help those ranked higher than them, may be more likely to be adopted in countries such as these if they fulfill this role. Not to say that these individuals necessarily seek out products that align with this stereotype, but it’s likely, given how powerful culture is, that they’ll naturally be inclined to accept products that fit their preconceived idea of what an “assistant” is. However, this is a bias that can be combated using a framework we saw earlier: the mutual constitution of cultures and selves. But I’ll get into that in just a little bit.

Without distinct class boundaries, countries low in power distance like the United States, Denmark, and Switzerland, might be more open to a voice assistant that they can see as a friend and as an equal. In their research, one team found that North Americans enjoyed when their voice assistant led intimate interactions with them by asking “How are you?”, or motivationally saying, “Are you ready for today?”. Interestingly, this wasn’t observed in Germany, despite its relatively low score on the PDI. German participants noted a preference for an assistant to use low-status language (“Okay, I saved the entry for you”) rather than high-status language (“Okay, the entry was saved”) because they perceived it as having more attractive qualities and it made them feel as if they had achieved a higher position in society.

So, yes, culture instills in us many great values, but it’s also what gives us our biases. This is where we can take advantage of the mutual constitution that occurs between our culture and ourselves. Since the behavior of individuals can impose change upon a culture, it’s possible that designing a voice assistant without the use of stereotypes, such as a female voice or low-status language, can mean confronting existing biases. If a device is designed to challenge these norms, then there’s potential for these human-to-voice-assistant interactions to then be transferred to human-to-human interactions.

Where do we go from here?

Now that we’ve seen how voice assistants are currently being used, why culture is important, and how to address culture in the development of voice assistants, you’re probably wondering what steps to take next. If there’s anything I’ve learned from my studies and researching interactive voice technologies, it’s that although many psychological findings are generalized to fit each culture, individual user experiences can sometimes contradict what’s been established in the literature. When thinking about your own approach to UX design, it’ll be important to consider cultural differences in language, perception of emotion, perception of gender, self-concept, and perception of class and social hierarchy. However, my key takeaway for the design process, whether it be for voice user interactions or other products, is to take the time to conduct user research. In doing so, you’ll be able to avoid relying on stereotypes and be open and attentive to diverse viewpoints. But most importantly, conducting thorough user research will allow you to really gauge your audience, make sure your product is representative of your population, and address the nuances of your users’ needs.

Erica is a recent graduate of Santa Clara University, where she studied Psychology and Public Health. She enjoys research and learning more about the link between her two areas of study. In her free time, she loves watching television and movies. Whether it be an action movie or a romantic comedy, she loves them all!