How Dead Neurons Hurt Deep Learning Training ?
It is refer as “dead neurons”, when we train a neural network improperly, as a result, some neurons die, produce unchangeable activation and never revive.
The main reason for “dead neurons” is that neurons run into the situation that always produce specific value and have zero gradient. This situation mostly comes with ReLU
.
According to the plot above, we can see there is an input interval that fit in this situation, that is, negative input value.
Assume we have a classical Dense
layer with ReLU
activation: y = ReLU(wx + b)
, here is a “simple” scenario which will make the layer drop into the situation.
Create a naive toy data for classification task
Build a classification model with one hidden layer. Initialize b
with extremely negative value, to simulate situation which might caused by huge learning rate or improper weight initialization.
We can observe that negative b
initial value results in higher “failure rate” and lower average accuracy.
Unless wx
have large enough value to help gradient get out of zero, negative b
keep the model away from being updated. Also, we can find that slightly positive b
help model to train better. That’s why some networks initialize b
with 0.01 rather than zero.
Another way to resolve this problem is using LeakyRelu
, since gradient is non-zero everywhere.
from keras.layers.advanced_activations import LeakyReLU
net = Dense(2, activation=LeakyReLU(), ...))(net_input)
Under wide range of b
initial value, we can see that performance is stabler.