Why is ReLU non-linear?

Maxim Lopin
2 min readOct 22, 2019

Rectified Linear Unit is an activation function used in nearly all modern neural network architectures. It’s defined as max(0, x).

At first glance it might look that the function is not that much of a nonlinearity, but I will show that you can approximate any function with it.

First, there is an important property of ReLU:

Therefore:

i.e. if we add ReLU(x - c) to any function f(x), outputs of f(x) will not be affected for x ≤ c.
g(x) = ReLU(x) + ReLU(x - c)

The angle at x = 5 can be tweaked by multiplying the ReLU(x - c) term by a constant, while outputs of g(x) for xc will not be affected by that. That means we can add many other ReLU terms, where each of them is shifted and multiplied by constants and get any shape of the curve that we want.

Example with more terms:

ReLU(x) + (-2)*ReLU(x — 2) + 2*ReLU(x — 4) + (-2)*ReLU(x — 6)

Generalized formula for approximating f(x) for 0xn:

(b is a constant term so function can be non-zero at x = 0)

In theory, the precision of the approximation can be perfect if you add a ReLU term for each possible value of x.

Conclusion

ReLU is a non-linear function, there is no way you could get any shapes on the graph having only linear terms, any linear function can be simplified to a form y = ab + x, which is a straight line.

--

--