Prune Tacotron2 and Fastspeech2 models

Prune Google and Microsoft TTS models, using magnitude pruning algorithm (MP) without Tensorflow/Pytorch API.

4 min readSep 10, 2020

Model Pruning is a class of AI models optimizations inspired by a human natural process that happens in the brain between early childhood and adulthood.

Pruning striving to diminish the number of parameters and operations required in the computation by excluding some components inside the neuron network. (connections, neurons, channel … )

The main purpose is to reduce the model size and inference time.
Pruning can be done before training, during training, and after training. We are supposed to apply it on the pre-trained model.
Enclosed a figure which illustrates the different components of pruning framework :

General steps for Pruning, inspired by Souvik Paul work: Pruning in Deep Learning Model

To start pruning our models, we should follow 3 main steps :

i. Chose the level of pruning: The most used pruning approaches are either made on neuron connections (called weight pruning ) or nodes (called neuron pruning ).

ii. Chose the pruning criteria, we will use the absolute value criteria.

iii. Chose the pruning algorithm, the based one is Magnitude Pruner.

Enclosed a Git link which contains several pruning algorithms papers, classified by release date

he-y/Awesome-Pruning

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision…

github.com

The bellow image illustrates the pruning level that we will work on for the next step, we will focus exclusively on pruning connections.

Pruned Level example shows removed connections and neurons

Weights Pruning

Weight pruning (pruning connections) in general means eliminating some values in the weight tensors. We set the neural network parameters values to zero to exclude what we consider additional connections between the layers of a neural network.

Magnitude Based Pruning

Magnitude based pruning consists to set “individual weights” in the weight matrix to zero. This corresponds to deleting connections between neurons. To achieve “sparsity” of K% Unknown character we rank the “individual weights” in weight matrix according to their magnitude (absolute value) |wi,j|, and then set to zero the smallest K% Unknown character.

NB:

Individual weights: its each |wi,j| of the Weight matrix.
Sparsity: This term used to describe the percentage of cells in a matrix or database that are not filled with data or are equal to zeros. ( On the other hand, a Dense array or database contains mostly non-zeros.)
The k% we will use is in this range:

k in ['0.1%','0.4%', '0.8%', '1%', '2%', '6%','8%','9%','99%']

Note that as we increase the percentage of pruning (increase the sparsity k%) the model performance will degrade.
We will do pruning onpthfile (Pytorch implementation, but it can be done also with Tensorflow).

You can find all the related code in my Google Colab notebook:

Tacotron2 reference:

Google Colaboratory

Edit description

colab.research.google.com

Fastspeech2 reference:

Google Colaboratory

Edit description

colab.research.google.com

Conclusion:

After pruning the tacotron2 and fastspeech2 models with Magnitude based pruning algorithm MBP (without using any function in Tensorflow or Pytorch API), we load the new checkpoints in PyTorch with gzip format.

We did use Pytorch version TTS models, but it still valid for Tensorflow implementation.

After, we try the inference based on k% sparsity of new checkpoints. We did find for tacotron2, the suitable checkpoint is around 8% of sparsity which reduce around 18% of the model (from 108mb to 87mb), and for fastspeech2 for 99% of sparsity, it reduces around 11% of model size (from 196 to 174mb).

Note that we exclude bias matrix from pruning in RNN and CNN part of tacotron2, and we do the pruning only on attention layer of Fastspeech2 without bias matrix.

Perspective:

While Fastspeech2 is based on Transformers neuron network, the best solution its to go for head pruning.

After we arrive at pruning the models, and if we assume that we are working with Tensorflow implementation, we can go for a model quantization using Tflite.

References:

The bases article explaining the pruning algorithm [Magnitude based pruning]:

https://towardsdatascience.com/pruning-deep-neural-network-56cae1ec5505

Pruning in Deep Learning Models

Pruning in deep learning basically used so that we can develop a neural network model that is smaller and more…

medium.com

matthew-mcateer/Keras_pruning

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

github.com

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

ArXiv: arXiv:2006.04558 * Equal contribution. Advanced text to speech (TTS) models such as…

speechresearch.github.io

Prune Tacotron2 and Fastspeech2 models

Prune Google and Microsoft TTS models, using magnitude pruning algorithm (MP) without Tensorflow/Pytorch API.

he-y/Awesome-Pruning

A curated list of neural network pruning and related resources. Inspired by awesome-deep-vision…

Weights Pruning

Magnitude Based Pruning

Tacotron2 reference:

Google Colaboratory

Edit description

Fastspeech2 reference:

Google Colaboratory

Edit description

Conclusion:

Perspective:

References:

Pruning in Deep Learning Models

Pruning in deep learning basically used so that we can develop a neural network model that is smaller and more…

matthew-mcateer/Keras_pruning

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

ArXiv: arXiv:2006.04558 * Equal contribution. Advanced text to speech (TTS) models such as…

Synaptic Pruning: Definition, Early Childhood, and More

Synaptic pruning is a natural process that occurs in the brain between early childhood and adulthood. During synaptic…

Written by Ahmed Barbouche