Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes 3/100

Shun John Iwase

Published in

Between Real and Ideal

2 min readApr 10, 2018

3rd thesis is about Rest-50 training with using extremely large minibatch SGD.

Overview

Introduce methods to maintain accuracy with extremely large minibatch size. In the experiment, they employed techniques such as

RMSprop warm-up
Batch normalization w/o moving average
Slow-start learning rate schedule

Conditions are as follows

Link

[1711.04325] Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Abstract: We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024…

arxiv.org

Author(s)

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

Preferred Networks

Published Year / Journal(s)

12 Nov 2017 / arXiv

What’s the difference from prior research?

Use extremely large minibatch size and employed some techniques such as RMSprop warm-up, batch normalization w/o moving average, and a slow-start learning rate schedule.

What’s the good point of this research?

Demonstrate highly-parallel training is possible with a large minibatch size w/o losing accuracy on carefully-designed software and hardware systems.

What’s the experimental method?

same with overview

Any discussions?

In normal situation, how come the larger batch size is, the lower accuracy is?
What does “Training Iteration” mean?

Which theses I should read next time?

Adversarial Feature Learning