Member-only story
How covariate shift happens in neural networks
And eliminating it with batch normalization
During the training process of a deep neural network, the input values are propagated through many hidden layers. When that happens, the input values are multiplied (scaled) by weights in each hidden layer. Weights are also updated during the training. So, we get different activation values in each hidden layer at different training steps. The distribution of activation values in each hidden layer also changes at each training step. This is known as the covariate shift problem in neural networks. Usually, deep neural networks with many hidden layers always suffer from this problem.
Covariates shift in a neural network simply refers to the change in the distribution of activation values in each hidden layer at different training steps.
To further understand how covariate shift happens in neural networks, consider the following 3-layer MLP neural network architecture.
The activation values of the hidden layer 1 are a1, a2, a3 and a4. They are the…