It’s not what is said, but how it is said — Sociophonetic inspired design strategies for VUI voice

Selina Jeanne Sutton
ACM CHI
Published in
4 min readApr 30, 2019

This article summarizes a paper authored by Selina Jeanne Sutton, Paul Foulkes, David Kirk, Shaun Lawson. This paper will be presented at CHI 2019, a conference of Human-Computer Interaction, on Wednesday 8th May 2019 at 14:00 in the session Audio Experiences. Full text available here.

Interest in Voice User Interfaces (VUIs) (e.g. Amazon’s Alexa, Apple’s Siri) both commercially and academically has increased significantly over the past few years. However, research into computer synthesized or generated voices has focused on how human-like or natural they sound, and how intelligible or easy they are to understand. Thus, the social aspects — how we react to different voices, and our attitudes and ideas about them — has been overlooked.

In this paper, we engaged with the theory and knowledge of the research field “sociophonetics” to define design strategies that will allow for the social aspects of VUI voices to be considered.

Interest in Voice User Interfaces, such as Siri on the iPhone, has increased significantly in the past few years. Photo by Syda Productions on Shutterstock.

What is sociophonetics?

Sociophonetics is the study of the social factors that affect the production and perception of speech. Elements of a person’s identity (where they grew up, the gender they are trying to portray, their age etc) influences the way they speak, and listeners are able to perceive these elements in someone’s speech. Equally, the way someone speaks influences how other people view them. For example, a listener may view a speaker as more or less trustworthy, friendly, or aggressive. So, different people react to different voices in different ways.

Speech varies across people as a result of elements of their identity, such as where they grew up, their age, and the gender they are trying to portray. Photo by Rawpixel.com on Shutterstock.

How can we design the voices of VUIs?

In our paper, we recommend three design strategies;

  1. Designing for Individualisation: Every person has gained a different experience of speech as a result of who they have interacted with. Therefore, our opinions of and reactions to different kinds of voices varies greatly, thus this design strategy proposes that each user’s social history could be taken into consideration when it comes to the voice of their VUI device. This could be by:
  • letting the user select a voice from a suite of options
  • have the user answer a series of questions to guide the selection of a voice on their behalf
  • providing the user with the ability to “build” a voice by allowing different aspects (e.g. the accent, softness, speed) to be manipulated

2) Designing for Context Awareness: It is possible to perform many different tasks using VUIs and this design strategy suggests that the voice changes to reflect this. Some illustrative ideas include:

  • within the context of entertainment, movies or music could be introduced by the VUI in a voice that reflects the relevant genre
  • within the context of banking activities, the voice could be designed to portray the social qualities of trustworthiness and reliability, qualities one would wish a real person to embody in such an interaction.

3) Designing for Diversity: This strategy argues that the impact of VUI voice design decisions should be considered beyond the user interacting with the device. The way someone speaks influences how other people view them. These views represent speech ideologies; attitudes and ideas about different ways of speaking. Unfortunately, speech ideologies can lead to listeners holding prejudices, and speakers experiencing discrimination.

For example, most countries have a Standard Accent (a way of speaking that is viewed as “correct”) and those who speak that way are unconsciously favoured and those that don’t are treated unfavourably. When the way someone speaks can influence how they are treated in school, job interviews, or the legal system, this can have a significant impact on their life.

Unless we are sensitised to these issues when considering VUI voices the longer-term implications are that positive and negative speech ideologies are reinscribed in human-computer interactions, potentially fuelling prejudice and discrimination further. One way to prevent this would be to use voices that the user doesn’t already have a speech ideology towards. Some ideas for how to do this include:

  • choosing an accent that the user is not familiar with
  • inventing new accents

Summary

In this paper, we consider the design of VUI voices by using the knowledge and theory from sociophonetics, and propose three design strategies as a result. In doing so, we define an exciting new area of investigation and innovation that focuses on the voice of voice user interfaces

For more details, please see the paper accepted to CHI19. Find the full text here. Full citation:

Selina Jeanne Sutton, Paul Foulkes, David Kirk, Shaun Lawson. 2019. Voice as a Design Material: Sociophonetic Inspired Design Strategies in Human-Computer Interaction. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland, UK. ACM, New York, NY, USA.

--

--

Selina Jeanne Sutton
ACM CHI
Writer for

I’m a 3rd year PhD student at Northumbria University, UK. My research interests lie at the intersection of HCI and sociolinguistics and sociophonetics.