TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices

Published in

DarwinAI

5 min readJan 22, 2021

TL;DR: Our new attention condenser architecture enables high-performing deep neural networks for robust deep learning on the edge. It resulted in networks 208x smaller with a higher accuracy on the Google Speech Commands benchmark dataset when compared to equivalent models.

At DarwinAI, our researchers are always looking to push the envelope when it comes to accelerating deep learning for TinyML (tiny machine learning). In this article we introduce our latest building block designed for deep learning on the edge: the attention condenser. It further enables our GenSynth platform to automatically build powerful deep neural networks to satisfy the tough challenges faced while bringing TinyML to anyone, anywhere, anytime. The entire paper with verbose details is available here. You can also read about them in this article at Inside Big Data.

What is an attention condenser?

Motivated to make networks faster at the edge, our research team focused on the recent emergence of self-attention: a breakthrough innovation in the world of deep learning due to its effectiveness. Much of the self-attention research has focused squarely on accuracy, with attention-based networks scaling larger and larger to attain and maintain this higher accuracy at the cost of efficiency.

To offset this imbalance, we designed the attention condenser with both accuracy and efficiency in mind to build deep neural networks for speech recognition on edge devices. We specifically designed the attention condenser to learn cross-dimensional neuron relationships at a reduced dimension to strike a balance between modelling efficacy and computational efficiency.

An attention condenser is a self-attention mechanism that learns a condensed embedding E of what information in V is relevant to decision-making, and performs selective attention (i.e., F(V,A,S)) to produce the focused output V’. Attention condensers can reduce the necessary number of large, stand-alone convolution modules.

Putting attention condensers to the test

Deep learning for speech recognition has been so successful that it is now widely used in numerous real-world applications (e.g., voice assistants, real-time closed captioning). The ability to perform real-time, on-device limited-vocabulary speech recognition will allow for the widespread use of voice interfaces on low-cost, low-power edge devices untethered from the Internet.

To address this challenge, we incorporated attention condensers into GenSynth, so that it can automatically determine the best microarchitecture and macroarchitecture designs for this specific purpose.

The most interesting observation about TinySpeech network architectures is the extremely sparse use of stand-alone convolution modules and heavy use of attention condensers. This setup results in significantly lower computational complexity in the resulting network architecture. Note the high architectural diversity of TinySpeech, with each attention condenser and each stand-alone convolution module having unique parameter counts. This level of architectural diversity can only be facilitated through GenSynth, which operates at the computational graph level. TinySpeech, therefore, has a highly efficient, low precision deep neural network architecture tailored for edge scenarios.

How does it perform?

To evaluate TinySpeech’s deep neural networks for limited-vocabulary speech recognition, we used the Google Speech Commands benchmark dataset and compared it against several deep speech recognition networks (res15-narrow, trad-fpool13, tpool2, TDNN, PONAS-kws2), all of which were designed for efficient on-device speech recognition purposes. The results are below.

As you can see, TinySpeech-A and B achieved high test accuracy rates comparable to the others (TinySpeech-A actually achieved one of the highest overall accuracy rates at 94.3%) with a fraction of the parameters and Mult-Adds when compared to others (TinySpeech-B had 208x fewer parameters used by trad-fpool13, yet was more accurate) and based on weights at 1/4th of the data precision.

Why does this matter?

By facilitating tetherless machine learning on the edge, TinyML enables trusted decision-making independent of cloud connectivity. Such capabilities are critical for empowering a wide range of applications where privacy, security, dependability, cost, and real-time considerations are important factors for deployment. These design strengths makes attention condensers perfect for the following use cases:

1. On-device speech recognition for ubiquitous, private voice assistants on everyday devices

2. Low-power visual recognition and understanding for everything from wearable devices to IoT devices to smart city applications

3. Real-time monitoring and predictive analysis on low-cost sensors on drones, aircrafts, helicopters, and cars

4. Real-time information processing and analytics on smart grids and low-power sensor arrays for smart farming

5. On-device medical screening on portable x-ray devices

Given the strong results our team produced by integrating attention condensers into GenSynth, we can’t wait to see how customers will use it to create highly-efficient and trustworthy TinyML applications for the edge.

DarwinAI, the explainable AI company, enables enterprises to build AI they can trust. DarwinAI’s solutions have been leveraged in a variety of enterprise contexts, including in advanced manufacturing and industrial automation. Within healthcare, DarwinAI’s technology resulted in the development of Covid-Net, an open source system to diagnose Covid-19 via chest x-rays.

To learn more, visit darwinai.com or follow @DarwinAI on Twitter.

If you liked this blog post, click the 👏 below so other people will see this on Medium. For more insights from our team, follow our publication and @DarwinAI. (Plus, subscribe to our letters if you’d like to hear from us!)

TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices

What is an attention condenser?

Putting attention condensers to the test

How does it perform?

Why does this matter?

Written by DarwinAI