How Dead Neurons Hurt Deep Learning Training ?

Joel Chao
joelthchao
Published in
2 min readSep 16, 2017

It is refer as “dead neurons”, when we train a neural network improperly, as a result, some neurons die, produce unchangeable activation and never revive.

The main reason for “dead neurons” is that neurons run into the situation that always produce specific value and have zero gradient. This situation mostly comes with ReLU.

Gradient and activation of ReLU

According to the plot above, we can see there is an input interval that fit in this situation, that is, negative input value.

Assume we have a classical Dense layer with ReLU activation: y = ReLU(wx + b), here is a “simple” scenario which will make the layer drop into the situation.

Create a naive toy data for classification task

Build a classification model with one hidden layer. Initialize b with extremely negative value, to simulate situation which might caused by huge learning rate or improper weight initialization.

We can observe that negative b initial value results in higher “failure rate” and lower average accuracy.

Unless wx have large enough value to help gradient get out of zero, negative b keep the model away from being updated. Also, we can find that slightly positive b help model to train better. That’s why some networks initialize b with 0.01 rather than zero.

Another way to resolve this problem is using LeakyRelu, since gradient is non-zero everywhere.

from keras.layers.advanced_activations import LeakyReLU
net = Dense(2, activation=LeakyReLU(), ...))(net_input)

Under wide range of b initial value, we can see that performance is stabler.

--

--

Joel Chao
joelthchao

A researcher likes to think interesting deep learning problems.