Voice Assistant Accessibility

Ensuring that everyone is understood

Angus Addlesee
TDS Archive
Published in
10 min readMar 31, 2023

--

I have been working on conversational AI with large language models (like GPT-4) for many years. The recent huge burst in popularity due to chatGPT is very exciting, but how can they be improved? There are many answers to that question of course, but in this article I will focus on accessibility.

How do we tweak our future machine learning models make use of limited datasets? And how should we design our agents to ensure that everyone can make use of advancements in voice artificial intelligence?

This is an abridgement of my paper published at IWSDS 2023. If you would like to cite anything discussed in this article, please cite the paper titled “Voice Assistant Accessibility”:

Harvard:
Addlesee, A., 2023. Voice Assistant Accessibility. Proceedings of the 13th International Workshop on Spoken Dialogue Systems Technology (IWSDS).

BibTeX:
@inproceedings{addlesee2023voice,
title={Voice Assistant Accessibility},
author={Addlesee, Angus},
journal={Proceedings of the 13th International Workshop on Spoken Dialogue Systems Technology (IWSDS)},
year={2023}
}

Over one billion people in the world are living with some form of disability, and voice assistants have the potential to improve people’s lives. For example, while I was visiting a respite care home called Leuchie House, one resident with Multiple Sclerosis (MS) explained how the disease’s progression slowly eroded away their independence. An Amazon Alexa device enabled this person to turn off their bedroom light without asking for a carer’s help. They told us that this was the first time they had regained some personal independence since diagnosis.

Stories like the one above motivate charities to promote the use of voice assistants as they can have real positive impact.

I (on the right) visited Leuchie House with colleagues in September 2021

The creators of voice assistants are getting HIPAA compliance for further application in the healthcare domain, and features are released specifically targeting vulnerable user groups. We similarly see early-stage researchers collaborating with other disciplines to apply their work to more specific healthcare applications (see my TDS article on voice research trends).

Voice assistant accessibility is therefore critical to ensure future systems are designed with the end user’s interaction patterns and needs in mind. Today’s voice assistants are trained and evaluated on huge datasets that represent the ‘average’ user, yet we know that speech changes as cognition declines. Existing commercial systems to assist people with visual impairments have huge privacy concerns, and people would need to openly announce disabilities when interacting with voice assistants in public spaces (discussed below).

As both research and industry push the hugely beneficial use of voice assistants beyond mass-market application - new challenges and ethical concerns arise that must be highlighted. In this article, I discuss the state-of-the-art research and currently available commercial systems for many user groups, e.g. people with dementia, motor neurone disease, sight loss, mental health conditions, and more.

I first discuss designing voice assistants for people with cognitive impairments and mental health conditions, then designing systems for people with physical impairments, thirdly I discuss voice privacy in public environments, and I finally conclude with a summary.

Dialogue Systems for People with Cognitive Impairments and Mental Health Conditions

Cognitive impairments impact memory, attention, problem-solving skills, decision-making, speech production, and more. The onset and progression of cognitive impairments typically correlate with a person’s age, but certain conditions (e.g., early-onset dementia) can be caused by strokes or head trauma. Mild cognitive impairment (MCI) shares the above symptoms but does not substantially interfere with the person’s life. While people with MCI are also usually older adults, another subset of brain health impacts people of all ages - mental health conditions.

The ARI robot (by PAL Robotics) used in the SPRING Project mentioned below (source)

Dementia and MCI

People pause mid-sentence more frequently and for longer durations as cognition declines. Other speech production changes also occur including: increased repetition, slowed speech rate, increased number of prepositions, and more. Today’s voice assistants mistake these long pauses for the end of the user’s turn, requiring the user to frustratingly repeat their entire utterance again.

People with dementia and MCI are recommended standard voice assistants (like Google Home and Amazon Alexa) for their daily use. Companies like CogniHealth work to curate content to help people with dementia and their families - providing valuable information, advice, and support through voice assistants - but the speech processing and understanding components of these systems do not have accessibility options for the specific challenges described above.

Research in this area is stifled due to the lack of data. Collecting natural spoken dialogue data with vulnerable older adults is ethically challenging, especially over the past few years due to COVID, and requires bespoke tools for data security. Research like the EU’s H2020 SPRING Project are collecting data to tackle these challenges in a hospital memory-clinic waiting room. In this setting, a patient with dementia or MCI will likely be accompanied by a family member or carer, introducing further multi-party complications (spoiler: the topic of my next article). Once collected, a sub-repository of TalkBank called DementiaBank can be used to share data with other researchers studying communication in dementia.

Mental Health Conditions

Certain mental health conditions also impact people’s speech production and behavior. People with anxiety speak at a faster rate and pause for shorter durations than healthy controls, whereas people with depression or PTSD speak more slowly and are more silent.

There do not appear to be commercial voice assistants that adapt their speech processing to better understand people with mental health conditions - although companies do exist targeting these user groups. Voice assistant apps like UB-OK and Kindspace provide a safe space for people to share their worries without judgment. People, particularly young adults, can ask about mental health and other concerns that they may not want to ask friends, teachers, or family.

Michael McTernan presenting UBOK at the Scottish Edge Awards 2019

Human-Robot Interaction (HRI) research is abundant in this area. The system’s speech processing again remains standard, but the robot’s interactions are modified. For example, communication techniques can be leveraged from psychology to encourage self-reflection and help with loneliness (common comorbidity with depression).

There really is a wide range of applications. Minimally verbal children with autism were given speech-generation devices that encouraged them to spontaneously talk and use novel vocabulary after long-term use. Similar work effectively used interactive social robots for autism therapy.

All of the above is exciting, but critically, no research focused on improving the system’s speech processing and understanding.

Designing Dialogue Systems for People with Physical Impairments

Physical impairments impact what a user may ask a voice assistant and how they can take action when given a response. For example, people affected by sight loss will ask about their visual surroundings, and people with hearing difficulties will expect a multimodal response.

Visual Impairment

Blind and partially sighted people often suffer from seemingly unrelated health conditions like malnutrition. This is sadly due to difficulties completing everyday tasks like shopping, preparing food, and cooking. A panel of visually impaired computer scientists talked at the Computer Vision and Pattern Recognition Conference (CVPR 2020) - detailing how difficult it is to navigate a train station when you cannot see the timetables, platform numbers, carriage letters, or direction signs.

Similar to a train station, people will ask questions about textual information on food packaging. This image is from an article on this exact problem in the kitchen (here).

Human-in-the-loop solutions are available, where you can connect with a sighted person to answer a question. A photo or video must be sent along with the question to relay the visual scene. BeMyEyes and BeSpecular rely on sighted volunteers to answer questions in a timely manner, whereas Aira has a trained team of professional agents.

One clear issue emerges when relying on volunteers: visually impaired people do not know if sensitive information can be seen in the sent images. Volunteers could therefore see mail with names, addresses, ID numbers, or valuables in the individuals home. Aira mitigates this by hiring and training staff with a focus on safety and security - but this comes at a price.

End-to-end (E2E) systems also exist like TapTapSee and Microsoft Seeing AI. These systems run securely on the cloud with encryption, so privacy concerns are minimal, but they introduce a new concern - accuracy. A visually impaired person can not verify that the system’s answer is in fact correct. With no ability to have a dialogue, user’s have to resort to trusting the system’s answer. This can lead to dilemmas like asking questions about medication. The E2E system could be incorrect, potentially harming the user, but the human-in-the-loop system requires them to send medication pictures to an unknown volunteer.

This image shows an example from some work attempting to answer clarification questions from people with sight impairments. The goal is to let users know when the system is unsure (here).

Limited Mobility

Household open-domain voice assistants are very convenient: we can set timers while our hands are oily from cooking, or turn up our music from the comfort of a warm blanket on the couch.

These functions are not just convenient for people living with limited mobility, they are critical for mental wellness. People with limited mobility in their hand, arms, or legs can retain some personal independence through their voice.

Mobility loss itself does not commonly impact speech production, but voice assistants can still be accessibly designed. The hardware exists (like advanced wheelchairs) to increase a person’s comfort and ability to complete daily tasks, but these technologies are not currently integrated with existing voice assistants. While visiting a respite care home called Leuchie House, a resident described their frustration when a voice assistant could open their curtains and turn off the TV, but they had to ask a carer to adjust their electric wheelchair’s headrest. This highlights the need for inclusive design.

Hearing Impairments

People living with hearing difficulties get frustrated with voice assistants, causing them to abandon their use altogether. This is even more difficult for those who developed hearing issues at a young age, as this often leads to speech impairments. For example, pronunciations are impossible to learn if you cannot hear a conversation. The impact of speech impairments is discussed below, but even without them - available voice assistants appear limited. Research notes that people with partial hearing loss really struggle to follow a conversation in a noisy environment (like a public space), but they felt more included in a conversation when a screen with a live-transcription of the ongoing conversation was set up. Real-time speaker identification would be even more effective. We should therefore ensure that voice assistants and social robots include a screen to enable this feature in public multi-party settings.

Honda’s Asimo robot saying “I love you” in ASL (source)

Many people with hearing impairments know one of around 200 signed languages. Research has shown that assistants can learn sign language, but an increased effort from the NLP community could utilise existing resources to improve sign language processing and generation. People with hearing impairments are keen to engage with the inclusive design of voice assistants to ensure accessibility progress.

Speech Diversity

Speech production is nuanced and unique to every individual, but automatic speech recognition (ASR) learns general speech patterns and therefore struggles to understand non-native speakers or people with thick accents (like in Scotland). This problem extends further, however. People with stammers are misunderstood, people who struggle with pronunciation (e.g. caused by hearing loss at an early-age) are misunderstood, and people with Tourettes are excluded from dialogue research. Non-standard speech can also be caused by conditions that affect the muscles we use to produce speech, like muscular dystrophy.

Google is innovating on this front with three projects. Project Euphonia and Project Relate are Google’s initiatives to help people with non-standard speech be better understood, and Project Understood is their program to better understand people with Down Syndrome. Google has even opened The Accessibility Discovery Centre to collaborate with academics, communities, and charitable/non-profit organisations to “remove barriers to accessibility”.

Project Relate

Loss of Speech

People with certain conditions like motor neurone disease (MND) slowly lose the ability to speak entirely. Stephen Hawking famously had a synthetic voice due to MND, but today they have improved hugely in terms of quality and diversity. Companies like Cereproc synthesise characterful, engaging, and emotional voices with varying accents to help people with MND choose a voice to best represent themselves.

Voice cloning is also possible, opening up the use of voice banking technology to people at risk of losing their voice. People capture hours of their speech to enable cloning at a later date if needed. One of these companies, SpeakUnique, can even reconstruct a person’s original voice if it has partially deteriorated since diagnosis.

Privacy

Personal voice assistants are often only used by an individual user in a private space. The voice assistant can therefore be highly customised to that user’s needs. However, this is not the case for assistants in public spaces. Social robots in museums, hospitals, airports, etc… will be used by many people in a day. Some accessible design implementations benefit everyone (we all forget words mid-sentence sometimes for example), but others would not.

A Pepper robot in a shopping centre (source). People would likely feel uncomfortable disclosing personal accessibility needs in a public space with other people around.

Most disabilities are invisible, so people would have to describe their disabilities aloud in a public space to activate certain accessibility features. This is similarly problematic without a voice assistant - disabled people often need to announce their disabilities and assistance needs in a shop.

Technologies like Neatebox are rising in popularity to tackle this issue. Disabled users get the app and note how they would personally like to be assisted (e.g. being led by arm). Then, when they enter the shop or airport premises, the customer service team are notified and personalised assistance is subtly provided. A similar technology could be used with social robots in public spaces to activate features when interacting with someone that requires an accessible voice assistant.

Conclusion

Voice assistants can improve people’s lives beyond simple convenience, and this can be achieved through ethical data collection and inclusive design. It is not simple, however, every system component in a voice assistant must be considered when designing systems for everyone.

I have summarised which specific components must be tweaked in order to make voice assistants more accessible for the user groups discussed in this article:

Table 1 in the paper

You can reach me on Medium, on Twitter, or on LinkedIn.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Angus Addlesee
Angus Addlesee

Written by Angus Addlesee

Applied Scientist at Amazon AGI (Alexa) with a PhD in Artificial Intelligence. Contact details at http://addlesee.co.uk/

Responses (1)