Applications of deep learning in speech recognition for kids

Siva Reddy Gangireddy
The SoapBox Tech Blog
2 min readJan 12, 2022

In this blog, I look at deep learning and how it’s being used at SoapBox Labs to improve our kid-specific speech recognition.

What is Deep Learning?

To understand deep learning, we need a basic understanding of machine learning.

Machine learning is a group of algorithms that focus on learning from data to make predictions and decisions without any explicit programming. It usually involves training a model on huge amounts of data to learn patterns so that predictions and decisions can then be made on new data. For example, the smart speakers we use in daily life are based on machine learning algorithms.

Deep learning is a form of machine learning that’s based on neural networks, a set of algorithms designed to mimic the function of the human brain. Any network with more than three layers is considered a deep neural network and the input is processed through those several layers to predict the desired output. Deep neural networks require huge amounts of data and are extensively used in speech recognition and image recognition. At SoapBox, our models are trained on thousands of hours of audio data and evaluated on in-house datasets regularly.

Why is deep learning important for kids speech recognition?

The goal of speech recognition is to convert users’ speech to text. Given the variations in audio data (such as pronunciation, accent, and noise), machine learning algorithms are used to ensure accuracy. Because of its superior performance, especially for understanding kids’ variable speech, deep learning is at the core of SoapBox’s voice engine and solutions like fluency assessments. We also use deep learning to deliver wake word detection, voice activity detection (VAD), and end-to-end speech recognition for on-device speech recognition.

--

--