AI Models and Machine Learning Used in Solving Speech Impairments

Published in

Digital Literacy for Decision Makers @ Columbia B-School

3 min readApr 19, 2020

Healthcare is maybe the most important industry in terms of value created and machine learning models are becoming a bigger part of it each day as the applications of AI instill hope for people who suffer from physical imperfections. Machine Learning in healthcare helps to analyze huge sets of data points and suggest outcomes, provide risk scores and has many other applications including disease diagnosis, drug discovery and clinical trial and research. In this article I would like to specifically focus on how machine learning is used to diagnose and handle speech impairments.

What is Speech Impairment?

Speech impairment is a condition in which the ability to produce speech sounds that are necessary to communicate is impaired. Several reasons of this imperfection include ALS disease, strokes, Parkinson’s disease, Cerebral Palsy and brain injuries. There are millions of people who suffer from one of these conditions and communicating F2F, keeping up with daily life and enjoying even the smallest instances such as making a joke can be challenging. Many initiatives are being taken by technology and healthcare companies including Google AI (Project Euphonia, a Google Research Project) to ensure voice activated technologies work properly with people who have impaired speech.

How Speech Recognition Works?

In summary, speech recognition works in 4 steps:

1. Sound of a person’s voice is converted into a waveform

2. Waveforms are matched to transcriptions or labels for each word

3. Trained machine learning model maps sounds (input) to words (output).

4. Algorithm predicts each word in a sentence (i.e. see or sea)

Although this sounds good in theory, the main challenge research teams face is gathering enough voice recordings from people with speech impairments to train speech recognition models to better understand people. And it is where formerly famous Ice Bucket Challenge came into the picture.

Ice Bucket Challenge

Ice Bucket Challenge was an activity which people dump of a bucket of ice water over their head to promote awareness of ALS disease and encourage donations to research. During an 8-week period in 2004, when the challenge became worldwide famous, around $200 million is raised for research. As the campaign went viral, ALS TDI (Therapy Development Institute) was able to utilize the donations to reach out to ALS patients to get their voice data. Being able to collect a huge dataset, ALS TDI attracted Google to collaborate.

Giving People Their Voice Back

As voice samples are being collected to train recognition models which perfectly understand what people with atypical speech say, the next step is basically to reverse the process by utilizing text to speech technologies, assistive technologies that read digital text aloud. Speech recognition as mentioned above works by taking the voice and converting into a text as an output. As a reverse process, researchers such as Google’s DeepMind team work on taking the text and vocalize it by using that person’s original voice. The team works on developing ML models which need less training data since most of the patients may not have enough time or resource to collect their voices. The technology works as following:

1. Train the WaveNet model with many speakers

2. Model produces basic natural sounding speech

3. Take small samples of data and adapt the model, which is also called finetuning

Above image you can see Google team sitting with Tim Shaw, a former NFL player and now an advocate for ALS disease, and his parents. Tim has been recording his voice and populate tens of thousands of recordings to be used in training WaveNet. Eventually, Tim was able to reunite with his voice and read a letter he wrote to his younger self by using the currently evolving model last year.

Even these developments may seem huge, most of these projects are early stage and continue to develop. Researchers predict that for projects to excel it will take years. But as Mark Twain points out, “The secret to getting ahead is getting started.”

https://www.youtube.com/watch?v=V5aZjsWM2wo

https://www.flatworldsolutions.com/healthcare/articles/top-10-applications-of-machine-learning-in-healthcare.php

https://sites.google.com/view/project-euphonia/learn-more?authuser=0

http://www.alsa.org/fight-als/ice-bucket-challenge.html

Using WaveNet technology to reunite speech-impaired users with their original voices

This post presents a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google's Euphonia…

deepmind.com

AI Models and Machine Learning Used in Solving Speech Impairments

Using WaveNet technology to reunite speech-impaired users with their original voices

This post presents a recent project we undertook with Google and ALS campaigner Tim Shaw, as part of Google's Euphonia…

Written by Gizem Kacmaz