The authors of this paper found that even when the network is randomly initialized, the final performance wasn’t that different from pre-trained (transfer learning) model. (specifically, pre-trained models from ImageNet data.).
Now, when we talk about the speed of convergence, pre-trained model had a faster convergence time. But overall, these findings challenge the general notion of ‘pre-trained models are the best way to go’.
In recent years, more researchers use the technique of transfer learning, which takes a pre-trained network and fine tune it for different tasks.
This paper challenges the above paradigm and shows that we can still achieve very competitive performances just from random initialization schemes. (the key is to use a normalization scheme and longer training time.).
- ImageNet Pre-training speeds up convergence
- ImageNet Pre-training does not automatically give regularization.
- ImageNet Pre-training does not show benefit for certain tasks.
The authors used normalization schemes such as group / synchronized batch normalization and found that both of those schemes enable competitive performance on a network that is initialized randomly. Additionally, the authors increased the number of epoch for a network that is trained from scratch. (longer training time).
The above image says it all, for region proposal task, pre-trained networks does converge faster, but in time the network that is trained from scratch catch up.
Again even when different augmentation there is not that much difference between the networks.
- Training from Scratch is possible but takes time
- Just needs good normalization schemes
- ImageNet Pre-training is good for some tasks.
When we do not have enough data for the target task, ImageNet pre-training might be the way to go. Otherwise, might be a good idea just to stick with training a network from scratch.
- He, K., Girshick, R., & Dollár, P. (2018). Rethinking ImageNet Pre-training. arXiv.org. Retrieved 4 January 2019, from https://arxiv.org/abs/1811.08883