Assemble-ResNet that is 5 times faster with the same accuracy as EfficientNet B6 + AutoAugment

Akihiro FUJII
Analytics Vidhya
Published in
8 min readFeb 2, 2020

About This post

his post is a explanation for paper “ Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network [1] “,posted on Jan. 17th 2020. This paper proposes Assemble-ResNet that has same accuracy as EfficientNet B6 + AutoAugment but 5 times faster.

This post explains:
1. Summary of the paper
2. BaseLine method, EfficientNet and AutoAugment
3. Techniques used in Assemble-ResNet
4. results

The summary of this paper is as following.

Build the network ,that combines some powerful existing techniques, achieve same accuracy as EfficientNet B6 + AutoAug but 5 times faster. The authors say that the latest ones such as AugMix are not used here, so the accuracy may still be improved.

Baseline Method

In this section, I explain EfficientNet and AutoAugment which are the baseline comparison of Assemble-ResNet. EfficientNet is a much lighter and more accurate network than the existing network proposed in May 2019. AutoAugment is a paper published in 2018 and is a research that automatically searches for the optimal data Augmentation. Both are powerful techniques that often appear as baselines in image recognition.

EfficientNet

EfficientNet [2] is a paper submitted on May 28, 2019, which is faster and more accurate than existing networks. The summary of is as follows.

They build a high-speed and high-precision network by simultaneously optimizing resolution, depth, and number of channels. By setting φ = 1 in equation 3, αβγ is optimized in the search space of MnasNet (B0), and then φ is changed and re-optimization is repeated to construct B1-> B

The optimal parameters are searched simultaneously in for depth, model width (number of CNN channels), and resolution. Find the most suitable network by setting three constraint parameters α, β, and γ. And performing a grid search within that range. EfficientNet B0, B1 … B7 in order of strict search constraints.

AutoAugment

AutoAugment is a paper submitted on May 24, 2018, and it is about searching for the best data augmentation method by reinforcement learning. The authors optimize the data augmentation policy consisting of five sub policies so that the loss of validation data is reduced.

(Left)Flow of AutoAugment(Right)examples of five sub policies

The following is the result of AutoAugment. You can see AutoAugment is a fairly powerful method.

Key Insight and Method

The point of Assemble-ResNet that is so powerful that it can beat EfficientNet is that it puts all the existing good network structures and regularization methods on ResNet.

The authors used ResNet-D [4], Selective-Kernel [5], Anti-Alias ​​Downsampling [6], and Big Little Network [10] to improve the network architecture for ImageNet. And use regularization methods that Label Smoothing, Mixup [7], DropBlock [8], Knowledge Distillation [9], and AutoAugment [2].

The following sections describe the network structure improvement and regularization methods used for Assemble-ResNet.

Network Tweaks

Network Tweaks 1. ResNet-D

ResNet-D is the improved architecture of ResNet proposed at CVPR2019 [4]. It has a structure as shown in the figure below, and it is a method that can increase accuracy without increasing computational cost.

Network Tweaks 2. Selective Kernel

The Selective Kernel [5] is inspired by the fact that the size of the receptive field in human image recognition differs from neuron to neuron. Original Selective kernel method integrates stream of kernel size 3x3, that of 5x5 convolution with attention. In Assemble-ResNet, the authors consider using a 3x3 filter that doubles the number of channels instead of using a 5x5 kernel in view of the balance between speed and accuracy. As a result of the examination, the structure of C3 in Table 2 is adopted.

Network Tweaks 3. Anti-Alias Downsampling

Anti-Alias ​​Downsampling [6] is a method designed to provide shift invariant to CNN. Downsampling methods such as MaxPooling do not become shift invariant on pixels. Therefore, the output of CNN was not robust against displacement. It is a method to have shift invariant property by adding anti-aliasing processing (BlurPool) in the process of downsampling.

Network Tweaks 4. Big Little Network

Big Little Network is a network structure for efficiently processing images of multiple resolutions. The figure below shows an example where the number of branches of the resolution is K = 2. The lower route processes the same image as the input resolution, and the upper route processes an image with half the size of the vertical and horizontal directions. In Assemble-ResNet, in addition to the ordinary Residual Block, a routes for processing images with reduced resolution are added.

Assemble ResNet Architecture

The network using the above technique is shown below.It is configured to add the above technique to the normal ResNet-50.

Regularization

Regularization 1. Label Smoothing

Label smoothing is a method that uses a soft label like [0.1, 0.9] instead of a one-hot (hard label like [0.0, 1.0]. According to this study [11], label smoothing has the effect of coordinating the distribution of each category, and is also effective in reducing overfitting to data (a penalty occurs even with a softmax value of 0.99 with one-hot label).

Regularization 2. MIXUP

MIXUP [7] is a data augmentation method that mixes both input and output. Sample the mixing ratio from the β distribution and mix both input and output of the two data. It is said that interpolated data is created between clusters of each class, so that the latent space is smoothed.

Regularization 3. DropBlock

DropBlock [8] is Dropout method for CNN. In the case of normal Dropout, the image will be missing as shown in the following figure b, but the pixels adjacent to the dropped pixel are often not dropped, so the regularization of the image will be weak . Therefore, DropBlock enhances the regularization effect by dropping pixels that are adjacent to each other in two-dimensional space.

Regularization 4. Knowledge Distillation

Knowledge Distillation is a technique for transferring huge, high-precision network knowledge to a small lightweight network. The large network of the distillation source is called the teacher model, and the small network of the distillation destination is called the student model. The student model is trained using not only the cross entropy with the normal teacher label (hard label) but also the output of the teacher model (soft label). It has been known that using this method lightweight student model can achieve higher accuracy,close to that of a high-precision teacher model, rather than learning using only student models and labeled data.

(Ref : https://nervanasystems.github.io/distiller/knowledge_distillation.html)

Results

The results of Assemble-ResNet using the above techniques are as follows. Assemble-ResNet-152 is as accurate as EfficientNet B6 + AutAugment top-1 Accuracy, but the inference speed is 5 times faster. mCE is a measure of how robust model prediction is to noise. It is measured using a dataset with noise on ImageNet.

In addition, you can see that each of the introduced methods has been effective. Even without AutoAugment, the accuracy is comparable to EfficientNetB3 + AutoAugment and the inference speed is more than 3 times faster.

Conclusion

In this post, I explained high accuracy and high speed Assemble-ResNet. Since knowledge distillation and AutoAugment are used, those accuracy can not be achieved by only one training with this network structure. But you can see that by combining existing effective methods in this way, it is possible to implement high-precision models even in real world task.

Reference

  1. Jungkyu Lee, Taeryun Won, Kiho Hong. Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network, arXiv:2001.06268
  2. Mingxing Tan, Quoc V. Le. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946
  3. Ekin D. Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, Quoc V. Le. AutoAugment: Learning Augmentation Policies from Data. arXiv:1805.09501
  4. Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Jun- yuan Xie, and Mu Li. Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni- tion, pages 558–567, 2019
  5. Xiang Li, Wenhai Wang, Xiaolin Hu, and Jian Yang. Selec- tive kernel networks.arXiv:1903.06586( 2019)
  6. Richard Zhang. Making Convolutional Networks Shift-Invariant Again, ICML2019
  7. Hongyi Zhang, mixup: BEYOND EMPIRICAL RISK MINIMIZATION, arXiv:1710.09412 (2017)
  8. Golnaz Ghiasi, Tsung-Yi Lin, and Quoc V Le. Dropblock: A regularization method for convolutional networks. In Advances in Neural Information Processing Systems, pages 10727–10737, 2018
  9. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  10. Chun-Fu Chen, Quanfu Fan, Neil Mallinar, Tom Sercu, and Rogerio Feris. Big-little net: An efficient multi-scale fea- ture representation for visual and speech recognition. arXiv preprint arXiv:1807.03848, 2018
  11. Rafael Müller, Simon Kornblith, Geoffrey Hinton. When Does Label Smoothing Help?. NeurIPS 2019(2019)

--

--