Review — Rethinking ImageNet Pre-training (Object Detection, Semantic Segmentation)

Training From Scratch Not Worse Than ImageNet Pre-Training

Sik-Ho Tsang
Feb 21 · 4 min read
The model, ResNet50-FPN Using GN, trained from random initialization needs more iterations to converge, but converges to a solution that is no worse than the fine-tuning counterpart.

In this story, Rethinking ImageNet Pre-training, by Facebook AI Research (FAIR), is briefly reviewed.

Pre-training have been used over training from scratch for many papers. However, is the pre-trained knowledge really useful when transferred to other computer vision tasks?

In this story, some facts are discovered:

  • Training from random initialization is surprisingly robust, the results hold even when: (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics.
  • ImageNet pre-training speeds up convergence early in training, but does not necessarily provide regularization or improve final target task accuracy.

This is a paper in 2019 ICCV with over 350 citations. (Sik-Ho Tsang @ Medium)

(There are many details on the experimental setup to make the experiment fair. I would skip some of the details and results to make the story short. If interested, please free feel to visit the paper.)


  1. Number of Training Images & Setup
  2. Training from Scratch to Match Accuracy
  3. Training from Scratch with Less Data
  4. Discussions

1. Number of Training Images & Setup

1.1. Number of Training Images Involved

Total numbers of images, instances, and pixels seen during all training iterations, for pre-training + fine-tuning (green bars) vs. from random initialization (purple bars).
  • Typical ImageNet pre-training involves over one million images iterated for one hundred epochs. In addition to any semantic information learned from this large-scale data, the pre-training model has also learned low-level features.
  • On the other hand, when training from scratch the model has to learn low- and high-level semantics, so more iterations may be necessary for it to converge well.
  • As shown above, if counting image-level samples, the from-scratch case sees considerably fewer samples than its fine-tuning counterpart.
  • Actually, the sample numbers only get closer if we count pixel-level samples.

1.2. Setup

  • Mask R-CNN with ResNet, and ResNeXt plus Feature Pyramid Network (FPN) backbones are used.
  • GN/SyncBN is used to replace all ‘frozen BN’. SyncBN means using BN under multiple GPUs.
  • The models are fine-tuned with 90k iterations (namely, ‘1× schedule’) or 180k iterations (‘2× schedule’) to a so-called ‘ schedule’ which has 540k iterations.

2. Training from Scratch to Match Accuracy

Learning curves of APbbox on COCO val2017 using Mask R-CNN with R101-FPN and GN
  • Typical fine-tuning schedules (2×) work well for the models with pre-training to converge to near optimum. But these schedules are not enough for models trained from scratch, and they appear to be inferior if they are only trained for a short period.

Models trained from scratch can catch up with their fine-tuning counterparts, if a 5× or 6× schedule is used. When they converge to an optimum, their detection AP is no worse than their fine-tuning counterparts.

3. Training from Scratch with Less Data

Training with 10k COCO images
  • Smaller training set of 10k COCO images (i.e., less than 1/10th of the full COCO set) is used.
  • The model with pre-training reaches 26.0 AP with 60k iterations, but has a slight degradation when training more.

The counterpart model trained from scratch has 25.9 AP at 220k iterations, which is comparably accurate.

4. Discussions

  • The above experiments also bring the below discussions by authors.

4.1. Is ImageNet pre-training necessary?

  • No, if we have enough target data.
  • This suggests that collecting annotations of target data (instead of pretraining data) can be more useful for improving the target task performance.

4.2. Is ImageNet Useful?

  • Yes.
  • ImageNet pre-training reduces research cycles, leading to easier access to encouraging results, and fine-tuning from pretrained weights converges faster than from scratch.

4.3. Is Big Data Helpful?

  • Yes.
  • But a generic large-scale, classification-level pre-training set is not ideal if we take into account the extra effort of collecting and cleaning data.
  • If the gain of large-scale classification-level pre-training becomes exponentially diminishing, it would be more effective to collect data in the target domain.

4.4. Shall We Pursuit Universal Representations?

  • Yes.
  • Authors believe learning universal representations is a laudable goal.
  • The study suggests that the community should be more careful when evaluating pre-trained features.

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Sik-Ho Tsang

Written by

PhD, Researcher. I share what I've learnt and done. :) My LinkedIn:, My Paper Reading List:

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store