Exploiting Nontrivial Connectivity for Automatic Speech Recognition

Tycho Tax
Tycho Tax
Sep 26, 2018 · Unlisted

We tested the effectiveness of three neural network architectures commonly used in image recognition for automatic speech recognition. These architectures: Residual Networks, Highway Networks, and Densely Connected Networks, all use nontrivial connections or skip connections. This allows networks with a very large number of layers to be trained without suffering from the vanishing gradient problem.

Read the Full Paper


Before skip-connectivity was introduced, shallow neural networks outperformed deeper models. When input is propagated forward through the network layers, information can easily be lost after only a relative few number of layers. Each layer introduces noise and at a certain point the noise overshadows important features of the original input. This is a problem as deeper networks are able to learn increasingly complex patterns which could result in a better model.

As illustrated in figure 1, networks with skip-connections address this problem by adding or concatenating output from previous layers to subsequent layers thereby making early stage information accessible throughout the network. In image classification tasks, neural networks using skip-connections have led to state-of-the-art results, but benchmarks on speech recognition models are limited.

Image for post
Image for post
Figure 1. https://arxiv.org/abs/1608.06993.

Approach and contribution

We decided to benchmark three architectures that are known to perform well on image tasks after modifying them for automatic speech recognition. Based on a purely convolutional architecture we adapt Residual Networks, Highway Networks and Densely Connected Networks to an automatic speech recognitions task. We train and evaluate the proposed architectures on a standard dataset and compare them to the convolutional baseline model (figure 2).

Image for post
Image for post
Figure 2.

The results are encouraging, as we show that skip-connections can be successfully used for automatic speech recognition. In particular, we find densely connected networks to outperform other proposed architectures and yield significant improvements on the transcription task.


Corti is a machine learning company, providing accurate…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store