The critical role acoustic models play in SoapBox’s voice engine

Published in

The SoapBox Tech Blog

1 min readJan 20, 2022

What is an acoustic model (AM)?

Acoustic Models (AM) are key components of any speech recognition engine. An AM describes the statistical properties of sound events and connects the acoustic information with phonemes or other linguistic units. Hidden Markov Models (HMM) are one of the most common types of AMs. Other acoustic models include Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN).

Why are AMs important for SoapBox?

AMs play an even more critical role at SoapBox than in normal, adult-focused voice engines because recognizing children’s speech is much more challenging. Kids have smaller vocal tracts and slower and more variable speech patterns. They inhabit noisy environments and use a lot of spontaneous speech, imaginative words, ungrammatical phrases, and incorrect pronunciations!

At SoapBox we work hard to design and train AMs that can cater to all of this complexity and do the heavy lifting of accurately converting children’s speech to text.

The critical role acoustic models play in SoapBox’s voice engine

What is an acoustic model (AM)?

Why are AMs important for SoapBox?

Written by Armin Saeb