Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes 3/100

Shun John Iwase
Between Real and Ideal
2 min readApr 10, 2018

3rd thesis is about Rest-50 training with using extremely large minibatch SGD.

Overview

Introduce methods to maintain accuracy with extremely large minibatch size. In the experiment, they employed techniques such as

  • RMSprop warm-up
  • Batch normalization w/o moving average
  • Slow-start learning rate schedule

Conditions are as follows

Link

Author(s)

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

Preferred Networks

Published Year / Journal(s)

12 Nov 2017 / arXiv

What’s the difference from prior research?

Use extremely large minibatch size and employed some techniques such as RMSprop warm-up, batch normalization w/o moving average, and a slow-start learning rate schedule.

What’s the good point of this research?

Demonstrate highly-parallel training is possible with a large minibatch size w/o losing accuracy on carefully-designed software and hardware systems.

What’s the experimental method?

same with overview

Any discussions?

  • In normal situation, how come the larger batch size is, the lower accuracy is?
  • What does “Training Iteration” mean?

Which theses I should read next time?

  • Adversarial Feature Learning

--

--

Shun John Iwase
Between Real and Ideal

Software Engineer / Love Go, Python, C++ / Beginner of HPC and DNN / Imagine Cup World 2017 Japanese Representative