Weak Speech Supervision (WSS): Make Your Small Data Speaks

Build Your Deep Learning Model More Powerful

Mirali Purohit
ML Brew
7 min readAug 15, 2021

--

In the era of Artificial Intelligence (AI), machines have become smarter. In the past few years, Deep Learning (DL) techniques are providing state-of-the-art results in many fields such as Computer Vision (CV), Natural Language Processing (NLP), Speech, Space Technologies, and many more. Over the period of advancement in the DL field, large data is always a bottleneck in getting the expected performance of models. Craig Mundie writes:

“Data are becoming the new raw material of business.”

Data speaks always a great story and gives tremendous information. This blog discusses the way to handle the large data requirement problem and improve the performance of the models in the domain of speech. It talks about an amazing methodology introduced to tackle the data shortage problem: Weak Speech Supervision (WSS). In this blog, we introduced one case study, “Dysarthria Severity Classification” to show the efficiency of the proposed approach.

Key Terminologies

Some terminologies that you need to go through before reading the blog. This helps you to understand the rest of the blog in a good way. If you already know these terms, you can skip this part of the blog.

  • Dysarthria

One of the greatest blessings we have is speaking ability; however, it is ignored by most of us. However, some people cannot speak fluently or produce speech in normal mode. Dysarthria is a speech disorder in which a person loses the natural way of speaking. Dysarthria caused by a disease, which results in that person is not able to move some articulatory parts (e.g., tongue, lips, jaw, etc.) which are used in speech production. Symptoms of dysarthria are less intelligible speech, uneven rate of speech, slurred speech, speech in an abnormal rhythm, speak slowly, talk too fast, speak softly, have a change in voice, sound hoarse or breath, etc. You can check few examples of dysarthria at the following link:

  • Severity

The severity of dysarthria was rated on a coarse scale ranging from none, mild, moderate to severe based on various factors such as degradation in the speech, condition of the patient, etc. This categorization is done by speech pathologists based on the severity of dysarthria.

  • Data Scarcity

In the recent era, DL techniques are providing promising results in many areas. Also, these methods outperform any existing method or human in some fields. With this exceptionally significant achievement, a large amount of data is the hurdle in building any DL-based systems, and this problem is known as Data Scarcity. In short, any DL model needs a huge amount of data to get high performance.

  • Weak Supervision

Weak Supervision is the method to generate data with the help of an existing small dataset, and this generated data along with existing original data can be used to train the DL architectures and to improve the performance of the system with more data. Weak supervision gets too much attention in the last decade due to its promising results. To explore more about weak supervision, read this:

Problem

As dysarthria is a speech disorder, people struggle when they speak and face difficulties when they speak for a longer duration. Because of this, we cannot collect more data and hence there is data scarcity. These people cannot use the Intelligent Personal Assistants (IPAs), i.e., Google Assistant, Amazon Alexa, etc. because currently, these systems work efficiently when speech is in natural mode. Also, knowing the severity of the speech in advance can help in improving the speech, and it can improve the overall performance IPAs. Additionally, severity level can help the speech pathologists in the patient’s treatment. Hence, classification and improvement of dysarthric speech is a very important task.

Approach

In this research work, the Weak Speech supervision (WSS) technique is introduced to solve the data scarcity problem for dysarthria severity classification. Now, the rest of the blog discusses the DL-based approach in depth.

Let’s assume we have labeled (i.e., trusted/original)(of size N1) and unlabeled (i.e., untrusted)(of size N2) data of dysarthric speech, where N2 >> N1. We want to build a classifier that classifies dysarthric speech based on severity-level. In this research work, dysarthric speech is classified into two classes:

where x is the feature of the dysarthric speech and y is the corresponding label.

The WSS technique involves two steps:

1) generation of the weak data,

2) utilize weak data along with trusted data to train classifiers.

You can see the whole process in the following schematic diagram:

Flowchart of WSS for Dysarthria Severity Classification
  1. Generation of weak data — To generate the weak data, a weak rule needs to be defined. The weak rule can be extracted by observing the patterns, some common behaviour, external knowledge, etc. on the trusted data. In this case, the unlabeled dataset is labeled using weak rules. Based on the severity, several speech parameters also change for dysarthric speech, e.g., speech rate, F0, pitch, voiced-unvoiced segments, energy, power, etc. It is observed that energy changes with the severity-level of dysarthric speech. Hence, we have used the energy-based parameter to define the weak rule. After extracting the energy parameter, we fix the threshold to define the weak rule and label the unlabeled data.
  2. Utilize weak data along with trusted data to train classifiers — As shown in the figure, classifiers are trained, one on trusted data (C1) and another on weak data (C2). For both the classifiers, a different or the same model can be selected. The objective/loss functions are selected accordingly for both the classifiers.

Results

We have done 3 different experiments by changing the ratio of trusted and weak data. Along with this, we have trained our classifier on trusted data only as a baseline. We got on an average 35.68% and 43.83% relative improvement in terms of accuracy and F1-score w.r.t. baselines, respectively.

You can check details about experiments, results and our research in the following link:

You can see the implementation of our work on below GitHub link:

Conclusions

  • A huge amount of data is the largest bottleneck in building any DL-based system to get the desired result.
  • For supervised learning problems, getting enough labeled data for Neural Network (NN)-based architectures is not a feasible task because data hand-labeling is laborious and it requires a human annotation which is time-consuming.
  • WSS is the technique that provides an approach to generate more speech data to train NN-based models along with trusted data.
  • WSS significantly improves the performance via adding weakly labeled data and introduced a new training paradigm by utilizing it for the first time in the speech domain.
  • This study provides a scope of improvements in terms of devising different training methods to utilize weak data and extend WSS in different domains of speech classification.

“Data is the sword of the twenty-first century, those who wield it well, the samurai.”

~Jonathan Rosenberg, former Googler

So, we have discussed how WSS can help in the data scarcity problem through a case study of dysarthria severity classification. If you are interested more in Dysarthria check this:

In the recent era, people use IPAs to a great extent because the use of these systems is easy, and it makes life simpler. However, as we have seen people with dysarthria cannot use these systems efficiently. And for that industries like Google, Amazon, Microsoft, etc. are working to improve these kinds of systems toward dysarthric speech. In 2019, Google has started a project called Euphonia to help people with dysarthria so that they can use the latest speech technologies efficiently.

If you want to get more updates on ongoing cutting-edge research in the field of applied ML and DL, please follow our publication:

Contact us if you want to write something in our publication.

Editors of the ML Brew:

Mihir ParmarTwitter, Linkedin, Google Scholar, Github

Mirali PurohitLinkedin, Google Scholar, Github

Special note— Thanks to Mihir for his equal contribution in carrying out this profound research along with me.

--

--