Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes 3/100
3rd thesis is about Rest-50 training with using extremely large minibatch SGD.
Overview
Introduce methods to maintain accuracy with extremely large minibatch size. In the experiment, they employed techniques such as
- RMSprop warm-up
- Batch normalization w/o moving average
- Slow-start learning rate schedule
Conditions are as follows
Link
Author(s)
Takuya Akiba, Shuji Suzuki, Keisuke Fukuda
Preferred Networks
Published Year / Journal(s)
12 Nov 2017 / arXiv
What’s the difference from prior research?
Use extremely large minibatch size and employed some techniques such as RMSprop warm-up, batch normalization w/o moving average, and a slow-start learning rate schedule.
What’s the good point of this research?
Demonstrate highly-parallel training is possible with a large minibatch size w/o losing accuracy on carefully-designed software and hardware systems.
What’s the experimental method?
same with overview
Any discussions?
- In normal situation, how come the larger batch size is, the lower accuracy is?
- What does “Training Iteration” mean?
Which theses I should read next time?
- Adversarial Feature Learning