Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 4/100

Published in

Between Real and Ideal

2 min readApr 13, 2018

4th thesis is about Batch Normalization (a.k.a BN) which allows us to use high learning rates and be less careful about initialization. These days, almost all experiments about DNN use BN and show notable improvements in training time.

Overview

Batch Normalization is technique to reduce internal covariate shift. First, I will explain covariate shift briefly.

Covariate shift means the difference of distribution between training data and test data. As usual, Normalization and Whitening is applied to datasets and shows some improvements. But the authors found out Internal Covariate Shift exists among hidden layers and it causes slow learning speed and the saturation problem, the resulting vanishing gradients.

To solve these problems, they developed BN. And they showed merely adding BN to state-of-the-art image classification model yields a substantial speedup in training.

Link

[1502.03167] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate…

Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes…

arxiv.org

Author(s)

Sergey Loffe, Christian Szegedy

Google, inc.

Published Year / Journal(s)

2 Mar 2015, arXiv

What’s the difference from prior research?

They focused on Internal Covariance Shift and developed BN to reduce it. And they also showed significant improvements in training time of image classification model.

What’s the good point of this research?

Almost all researchers used to tune DNN by using technique such as L2 weight regularization and dropout. However, BN also acts as regularizer, so it made these techniques obsolete. Because of this research many researchers already do not need to waste time to tune DNN.

What’s the experimental method?

Applied BN to a state-of-the-art image classification model and compared it with normal network.

Any discussions?

Are there any cons?
Is BN useful for all kinds of networks (Like RNN, GANs)?

Which theses I should read next time?

Generative Adversarial Network