Prune Tacotron2 and Fastspeech2 models

Prune Google and Microsoft TTS models, using magnitude pruning algorithm (MP) without Tensorflow/Pytorch API.

Ahmed Barbouche
4 min readSep 10, 2020

Model Pruning is a class of AI models optimizations inspired by a human natural process that happens in the brain between early childhood and adulthood.

Pruning striving to diminish the number of parameters and operations required in the computation by excluding some components inside the neuron network. (connections, neurons, channel … )

The main purpose is to reduce the model size and inference time.
Pruning can be done before training, during training, and after training. We are supposed to apply it on the pre-trained model.
Enclosed a figure which illustrates the different components of pruning framework :

General steps for Pruning, inspired by Souvik Paul work: Pruning in Deep Learning Model

To start pruning our models, we should follow 3 main steps :

i. Chose the level of pruning: The most used pruning approaches are either made on neuron connections (called weight pruning ) or nodes (called neuron pruning ).

ii. Chose the pruning criteria, we will use the absolute value criteria.

iii. Chose the pruning algorithm, the based one is Magnitude Pruner.

Enclosed a Git link which contains several pruning algorithms papers, classified by release date

The bellow image illustrates the pruning level that we will work on for the next step, we will focus exclusively on pruning connections.

Pruned Level example shows removed connections and neurons

Weights Pruning

Weight pruning (pruning connections) in general means eliminating some values in the weight tensors. We set the neural network parameters values to zero to exclude what we consider additional connections between the layers of a neural network.

Magnitude Based Pruning

Magnitude based pruning consists to set “individual weights” in the weight matrix to zero. This corresponds to deleting connections between neurons. To achieve “sparsity” of K% Unknown character we rank the “individual weights” in weight matrix according to their magnitude (absolute value) |wi,j|, and then set to zero the smallest K% Unknown character.

NB:

  • Individual weights: its each |wi,j| of the Weight matrix.
  • Sparsity: This term used to describe the percentage of cells in a matrix or database that are not filled with data or are equal to zeros. ( On the other hand, a Dense array or database contains mostly non-zeros.)
  • The k% we will use is in this range:

k in ['0.1%','0.4%', '0.8%', '1%', '2%', '6%','8%','9%','99%']

  • Note that as we increase the percentage of pruning (increase the sparsity k%) the model performance will degrade.
  • We will do pruning onpthfile (Pytorch implementation, but it can be done also with Tensorflow).

You can find all the related code in my Google Colab notebook:

Tacotron2 reference:

Fastspeech2 reference:

Conclusion:

After pruning the tacotron2 and fastspeech2 models with Magnitude based pruning algorithm MBP (without using any function in Tensorflow or Pytorch API), we load the new checkpoints in PyTorch with gzip format.

We did use Pytorch version TTS models, but it still valid for Tensorflow implementation.

After, we try the inference based on k% sparsity of new checkpoints. We did find for tacotron2, the suitable checkpoint is around 8% of sparsity which reduce around 18% of the model (from 108mb to 87mb), and for fastspeech2 for 99% of sparsity, it reduces around 11% of model size (from 196 to 174mb).

Note that we exclude bias matrix from pruning in RNN and CNN part of tacotron2, and we do the pruning only on attention layer of Fastspeech2 without bias matrix.

Perspective:

While Fastspeech2 is based on Transformers neuron network, the best solution its to go for head pruning.

After we arrive at pruning the models, and if we assume that we are working with Tensorflow implementation, we can go for a model quantization using Tflite.

References:

The bases article explaining the pruning algorithm [Magnitude based pruning]:

https://towardsdatascience.com/pruning-deep-neural-network-56cae1ec5505

--

--

Ahmed Barbouche

Software engineer at Orange International Infrastructure & services | Data Engineering, Data Science , Artificial intelligence Enthusiast