Why is ReLU non-linear?
Rectified Linear Unit is an activation function used in nearly all modern neural network architectures. It’s defined as max(0, x).
At first glance it might look that the function is not that much of a nonlinearity, but I will show that you can approximate any function with it.
First, there is an important property of ReLU:
Therefore:
The angle at x = 5 can be tweaked by multiplying the ReLU(x - c) term by a constant, while outputs of g(x) for x ≤ c will not be affected by that. That means we can add many other ReLU terms, where each of them is shifted and multiplied by constants and get any shape of the curve that we want.
Example with more terms:
Generalized formula for approximating f(x) for 0 ≤ x ≤ n:
In theory, the precision of the approximation can be perfect if you add a ReLU term for each possible value of x.
Conclusion
ReLU is a non-linear function, there is no way you could get any shapes on the graph having only linear terms, any linear function can be simplified to a form y = ab + x, which is a straight line.