Day 106(DL) — EfficientNetV2: Smaller Models and Faster Training

Nandhini N
May 1 · 3 min read

In this post, we’ll discuss Google’s EfficientNetV2 released in April 2021. The article will be a quick review of the original research paper. A brand new convolutional networks, more efficient in terms of both training speed and accuracy. The core idea is to have a mix of training-aware neural architecture search and scaling. The outcomes of the experiments show that the new model trains faster than the current SOTA architectures while downsized to 6.8x smaller.

Fig1 — shows the comparison of accuracy — source
  • The whole training process is further enhanced by progressively increasing the image size while adaptively adjusting the regularization techniques such as drop out and data augmentation.

Training efficiency is paramount to any deep learning models because of the model’s humungous size and the expanding input data. For instance, GPT-3 demands high computational resources along with a significant amount of time(in weeks) to train the entire model. Such kind of architectures poses a challenge during the retraining process, though produces remarkable outcomes when applied using the concept of transfer learning.

Training efficiency recently became a focal point, resulting in the introduction of a wide range of neural networks to improvise efficiency by employing distinct methods(removal of batch normalization, usage of attention layers in Convnets etc). Even though these approaches have done a justification on the efficiency front, the gigantic size of the parameters still remains a stumbling block.

The motivation for the new proposal is taken by observing the behaviour of the existing EfficientNets.

  • When the image size is very large, the training becomes slow
  • depthwise convolutions are slow in early layers
  • simply scaling up every stage is not effective

Let’s explore the above points in depth.

Training with very large image sizes is slow: EfficientNet’s large image size consume more storage resulting in smaller batch sizes during training(the root cause of slowness). As we reduce the size of the input image, the batch size could be expanded accordingly(resulting in better speed).

Depthwise convolutions are slow in early layers: Depthwise convolutions are popular for fewer parameters and FLOPs than usual convolution settings. But the drawback is the inability to leverage modern accelerators completely. The concept of Fused-MBConv as shown below is introduced to overcome this shortcoming.

Fig 2 — compares MBConv & Fused-MBConv — source

Replacing MBConv with Fused-MBConv resulted in more parameters and FLOPs having a detrimental impact on the training speed. The next step is to bring out the best from both the architectures by automating the selection process.

Equally scaling up every stage is sub-optimal: EfficientNet equally scales up all stages (i.e) for instance, when the depth coefficient is 2, all stages in the network would double the number of layers. Treating all the layers the same does not improve the training process as not all stages are equally contributed to the training speed.

Proposed Solution: NAS search combined with Progressive Learning.

NAS Search: The search space comprises of various convolutional operation types {MBConv, Fused-MBConv}, number of layers, kernel size {3x3, 5x5}, expansion ratio {1,4,6}.

Progressive Learning: When monitored closely on model training, when the image size is small, the performance is better even with less augmentation. On the other hand, to achieve the same performance as the larger images, we need strong augmentation techniques. This idea motivates to have adaptability in regularization depending on the gradual increase in the image size during the progressive learning process.

During the early training epochs, the network is trained with small size images and weaker regularization. As the number of epochs increases, so as the size of the images and the regularization process.

Fig3 — shows the regularization adaptability w.r.t to image size — source

Recommended Reading:

https://arxiv.org/pdf/2104.00298.pdf

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Nandhini N

Written by

AI Enthusiast | Blogger✍

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/. Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store