[Paper Summary] Rethinking ImageNet Pre-training

Is pre-training always a superior method of training?

Jae Duk Seo
Jan 4, 2019 · 2 min read
Gif from this website

Abstract

The authors of this paper found that even when the network is randomly initialized, the final performance wasn’t that different from pre-trained (transfer learning) model. (specifically, pre-trained models from ImageNet data.).

Now, when we talk about the speed of convergence, pre-trained model had a faster convergence time. But overall, these findings challenge the general notion of ‘pre-trained models are the best way to go’.

Introduction

In recent years, more researchers use the technique of transfer learning, which takes a pre-trained network and fine tune it for different tasks.

This paper challenges the above paradigm and shows that we can still achieve very competitive performances just from random initialization schemes. (the key is to use a normalization scheme and longer training time.).

  1. ImageNet Pre-training speeds up convergence
  2. ImageNet Pre-training does not automatically give regularization.
  3. ImageNet Pre-training does not show benefit for certain tasks.

Methodology

The authors used normalization schemes such as group / synchronized batch normalization and found that both of those schemes enable competitive performance on a network that is initialized randomly. Additionally, the authors increased the number of epoch for a network that is trained from scratch. (longer training time).

Results

The above image says it all, for region proposal task, pre-trained networks does converge faster, but in time the network that is trained from scratch catch up.

Again even when different augmentation there is not that much difference between the networks.

Further Discussion

  1. Training from Scratch is possible but takes time
  2. Just needs good normalization schemes
  3. ImageNet Pre-training is good for some tasks.

When we do not have enough data for the target task, ImageNet pre-training might be the way to go. Otherwise, might be a good idea just to stick with training a network from scratch.

Reference

  1. He, K., Girshick, R., & Dollár, P. (2018). Rethinking ImageNet Pre-training. arXiv.org. Retrieved 4 January 2019, from https://arxiv.org/abs/1811.08883

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Jae Duk Seo

Written by

https://jaedukseo.me I love to make my own notes my guy, let's get LIT with KNOWLEDGE in my GARAGE

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com