Pruning Convolutional Neural Networks
Machine Learning is the new buzzword across industries, the genie of the 21st century. It promises everything from replacing human labor in laborious menial tasks, to providing high level insights and analysis for decision making. The magic is further enhanced when combined with Internet of Things (IoT), and could potentially create a world where our environment is an extension of our brain and responds intelligently to our whims and wishes.
However, the more powerful the genie is, the bigger the magic lamp required to host it. In other words, Machine Learning software can often be so computationally intensive that it is unsuited for lightweight IoT devices. Here, we explore the idea of Convolutional Neural Networks, a key aspect of the Machine Learning in Image recognition and Object detection, and how they may not need to be as massive as they are.
Convolutional Neural Networks (CNN) work by applying N number of filter channels to an input image (to be referred to as tensor hereafter). Suppose an input tensor is in the shape (height, width, number of previous channels). Each filter channel can be thought of as a small box of weights that goes around the height and width of the input tensor and perform element-wise multiplication with the values at that location and summing them up to output a single value at that location. Depending on the size of the filter channel box and how it moves around the input tensor, a single box will output a tensor of the shape (new_height, new_width, 1). By applying N number of filter channels, the output tensor will be of the shape (new_height, new width, N). This therefore constitutes a single CNN layer.
Image Classification and Object Detection models typically have a large number of CNN layers (over 50 in MobilenetV2¹) and each CNN layer can have the number of filter channels ranging from 10 to over 100. As a result, a lot of parameters (weights) are stored in the model, making the model complex and computationally expensive to use. This motivates us to come up with ways to reduce the size of the model, without impacting its capabilities.
The way we discuss here is pruning of a trained model, by removing parameters deemed unnecessary to the model. The method of pruning is inspired by the paper “Pruning Filters for Efficient ConvNets” by Hao Li et al².
When we look through a fully-trained Neural Network, we can see that many of its weights are very close to zero. This indicates that these weights do not play a significant role in determining the output of the Neural Network. If we think of the Neural Network as a brain, it means that only a small portion of the brain is actually being used to solve a given problem, while most of the brain lies inactive.
Applying this idea, we go through the CNN layers in the model, check out which filter channels have weights that are very close to zero, and remove these channels completely from the layer. Thereafter, we reconnect the individual CNN layers appropriately (using a layer where necessary to pad an input/output tensor with zeros to get the correct shape) to ensure that information can still flow properly through the model. The result is a much leaner model that can still perform the image classification/object detection it was trained to do.
We implemented the pruning idea on a version of the SSDnet³, to see how pruning affects the capability of the model (represented by mAP, a measurement of how accurate it is in detecting objects). As shown in Figure 3, we could remove up to around 76% of the CNN channels, and the model continues to be resilient enough to operate on the remaining 24% of the channels and retain the original level of accuracy. It also tells us that the model fundamentally requires around 24% of the channels to operate effectively, and any further reduction severely impacts its capability. The code is available online.
There appears not to be a hard rule on the percentage of channels a model can afford to lose before it stops working, with this amount varying from model to model. However, what this tells us in conclusion is that significant portion of parameters in a Convolutional Neural Network do not really play an important role. This notion possibly extends to other forms of Neural Networks, such as Feed-forward Neural Networks and Recurrent Neural Networks, and definitely deserve further exploring. This concept could potentially be valuable in designing Machine Learning models for usage in lightweight IoT devices.
References:
[1] Mark et al, MobileNetV2: Inverted Residuals and Linear Bottlenecks, 2018, CVPR 2018
[2] Hao Li et al, Pruning Filters for Efficient ConvNets, 2016, ICLR 2017
[3] Wei Liu et al, SSD: Single Shot MultiBox Detector, 2016, ECCV 2016