Model Pruning in Keras with Keras-Surgeon

Anuj shah (Exploring Neurons)
Exploring-Neurons
Published in
7 min readMay 5, 2019

Recently, I started reading about how to do efficient model inference through model pruning and wanted to quickly try out a few pruning techniques by myself. As I am a hardcore Keras user, I looked up some libraries in Keras that can help me to do so and luckily, I found one called keras-surgeon developed by Ben Whetton. You can have a look at keras-surgeon and a short description copied from the readme page is pasted below:

Keras-surgeon provides simple methods for modifying trained Keras models. The following functionality is currently implemented:

  • delete neurons/channels from layers
  • delete layers
  • insert layers
  • replace layers

Keras-surgeon is compatible with any model architecture. Any number of layers can be modified in a single traversal of the network.

Model Inference

Two standard steps for deploying a machine learning model are:

1. Model Training

2. Model Inference

Consider you wish to deploy a deep learning model for classifying dog vs cat. What do you do-

1. Prepare the dataset consisting of images of dogs and cats and group them in train and validation category.

2. Design a neural network model

3. Train the model with the dataset

Once the model is trained we use the trained model weights to do model inference for predicting the result on new unseen data.

The picture below from Michael Copland's blog explains it well

Pic credit: Nvidia’s blog by Michael Copland- What is the difference between deep learning training and inference?

Once the model is trained, it’s not necessary that we need all the weights for inferring the model on the new data set. We can make the model inference more efficient by reducing its size and speeding its computation so that it can easily be deployed on embedded systems like mobile phones, FPGA, Jetson boards, etc.

There are umpteen resources about model inference that you can look at. some are listed below:

1. Pruning Deep Neural Network to make them fast and small by Jacob Gildenblat

2. https://nervanasystems.github.io/distiller/pruning/index.html

3. What is the difference between deep learning training and inference? by Michael Copland

4. Lecture 15: Efficient Methods and Hardware for Deep Learning by Song Han

5. Song Han, Jeff Pool, John Tran, William J. Dally. Learning both Weights and Connections for Efficient Neural Networks, arXiv:1607.04381v2, 2015.

6. Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, Hans Peter Graf. Pruning Filters for Efficient ConvNets, arXiv:1608.08710v3, 2017.

My Experiments

I started with a simple dog vs cat classifier. I followed this blog at keras- “Building powerful image classification model using very little data” for training a simple 3 convolution layer network and using data augmentation

Training data — 2000 samples(1000 cat and 1000 dog)

Validation data — 800 samples (400 cats and 400 dogs).

I trained for around 100 iterations on a CPU machine and then evaluated the accuracy of validation data using trained weights. The original model and its performance is depicted below.

The original model with 3 convolution layers with 32,32 and 64 filters respectively. One fully connected layer with 64 neurons and final output sigmoid layer with 1 output neuron.

The Keras code for the same is shown below

The original CNN model used for training
The Result predicted by the trained model. Accuracy-67.375%, Loss-0.69469

With that much done, let’s start with model pruning now –

The convolution layer has some number of filters (In the code above there are 32,32 and 64 filters in the first, second, and third convolution layer respectively) and each filter produces a feature map as depicted by the picture below.

Pic Credit: Compressing Deep Neural Nets. Pruning the model by removing the output channel whose filters are not important.

I am going to try out just one pruning technique described in this blog — Compressing Deep Neural Nets. i.e. remove the output channel of the filter that are least important(right picture of above image) and does not hurt the performance of the model.

The author suggests that there are different ways to find the filter relevance but the most simple is computing the L1-norm of each filter’s weight (take the absolute value of the filter’s weight and add them up) and removing those whose L1-norm is the least. I am going to do the same.

In the model shown above I have three convolution layers and their L1-norm in ascending order of their L1-norm values are shown in the graph below

Graph showing the L1-norm of different filters of the three convolution layers in ascending order.

The code to compute L1-norm and plot the above graph is shown below.

Experiment -1

I will start with removing filters from convolution layer -1 which has 32 filters whose L1-norm values are shown below

L1-norm values of different filters of 1st convolution layer

I remove the 6 filters (7,26,22,30,15,24) with the least L1-norm using keras-surgeon in the below code

The code to delete different filters from the convolution layer. layer_0 refers to the first convolution layer.
Pruning the model — removing 6 filters from the first convolution layer reduces the model parameters by 0.1563% and improves the accuracy by 2.375%

The model accuracy improves from 67.375% to 69.75% and 1896 parameters/weights got eliminated i.e 0.156 % reduction.

Experiment — 2

Then I removed 12 filters (7,26,22,30,15,24,11,27,14,13,4,1) instead of 6 from the first convolution layer and evaluated the performance on evaluation data. This is done using the Keras surgeon code as shown above

Pruning the model — removing 12 filters from the first convolution layer reduces the model parameters by 0.3127% and improves the accuracy by 1% w.r.t original model

The model accuracy improved from 67.375% to 68.375% but it reduced from 69.75% to 68.375% with respect to pruning done in the first experiment. 3792 parameters/weights got eliminated i.e 0.3127 % reduction.

The below table summarizes whatever we have done till now

Although the accuracy improved, the reduction in size is not significant because most of the parameters are in the fully connected layer. In our model, there is just one fully connected layer at the end with 64 neurons and the input to this is the flattened layer from the last (3rd) convolution layer. So the idea is to remove the filters from the last convolution layer and it will give us a significant reduction in model size.

Experiment — 3

In this experiment, we will remove filters from 3rd convolution layer. There are 64 filters in 3rd convolution layer and the picture below shows different filters along with their L1-norm values

L1-norm values of 64 filters of the 3rd convolution layer

I removed 12 filters from the 3rd convolution layer. layer_6 represents the 3rd convolution layer

The model accuracy did not improve at all from the original 67.375% but 225420 parameters/weights got eliminated i.e 18.59 % reduction. That’s quite significant

Then I tried to delete even more channels — 24 channels to see what happens

The model accuracy again did not improve at all from the original 67.375% but 450840 parameters/weights i.e. 37.182 % reduction. That’s amazing!. The picture below shows the above discussion.

Pruning 12 and 24 filters from 3rd convolution layer did not improve any accuracy, However the number of weights got reduced by 18.59 % and 37.18 % respectively

We clearly see that the elimination of filters in 3rd convolution layer reduce the model size and the reduction of filters in 1st convolution layer improves the accuracy so let’s combine the first and third experiment.

Experiment — 4

Now we are going to remove filters from both the first and third convolution layer simultaneously. To perform multiple pruning at once with keras surgeon we can use the surgeon class as sown in the code below

Code snippet for doing multiple pruning at once

I eliminated 6 filters from the 1st convolution layer and 24 filters from the 3rd convolution layer.

Eliminating 6 filters from 1st convolution layer and 24 filters from 3rd convolution layer improved the accuracy by 2.375% and reduced the model size by 37.338%

As you can see in the above diagram combining the 1st and 3rd experiments we could improve the accuracy by 2.375% while achieving a reduction in model size by a significant 37.338 %.

To conclude, I would like to state that this small experiment already shows how model pruning can be effective in reducing model size so that it can easily be deployed on embedded systems and inferred. I have tried just one technique but there are many other model pruning techniques which you can refer to the resources mentioned above.

I would like to thank all the people whose blogs I referred and especially Ben Whetton for his amazing Keras-surgeon library.

Till then, Keep Learning! Keep Exploring Neurons!

If you find my articles helpful and wish to support them — Buy me a Coffee

--

--