Some Intuitions of ReLU Activation Function

Ting-Hao Chen
Machine Learning Notes
1 min readNov 10, 2017

ReLU activation function is fast to compute and won’t saturate.

ReLU can be written as f(x) = max(0, x). And this is what ReLU (Rectifier Linear Unit) function looks like:

ReLU function

First, ReLu is fast because its function is very simple. Therefore the computation of stochastic gradient descent is inexpensive.

Second, the gate remains close when x < 0 in back propagation, because the derivative is zero. This implies that ReLU will filter out the negative values and so these neurons are deactivate or in other words, the neurons are dying. Since these neurons are not going to work in the future iteration, you should be careful of your learning rates.

Third, the gate remains open when x > 0 in back propagation, because the derivative remains constant. Since the derivative remains constant, it does not saturate like sigmoid function.

--

--