Unraveling the Mystery of ReLU: Why it’s Continuous but Not Differentiable

Kashifrazagill
3 min readFeb 10, 2023

The Rectified Linear Unit (ReLU) is one of the most popular and widely used activation functions in machine learning today. Despite its widespread use, there seems to be a persistent misconception that ReLU is both continuous and differentiable. However, this is not entirely accurate. In this article, we will explore the continuity and differentiability of ReLU and explain why this function can still be used as an activation function in machine learning.

To start with, let’s define ReLU: f(x) = max(0, x). In simple terms, if x is less than or equal to 0, the function will return 0, otherwise, it will return x. If we plot this function, we will observe that there are no discontinuities in the function. This means that ReLU is continuous, but this alone is not enough to answer the question of differentiability.

ReLU Activation Function Plot

For a function to be differentiable, it must be continuous and its derivative must exist at every individual point. The derivative of a function can be calculated using the formula attached.

Looking at ReLU’s plot, the interesting point to consider is when x = 0. It’s at this point where the function changes abruptly and if there’s going to be an issue with the function’s derivative, it’s going to be here. For ReLU to be differentiable, its derivative must exist at x = 0. To check whether the derivative exists, we need to find out if the left-hand and right-hand limits exist and are equal at x = 0.

To find the left-hand limit, we compute the derivative when h approaches 0 from the left. At x = 0 and h < 0, the derivative is 0.

The right-hand limit is found similarly, where h approaches 0 from the right. At x = 0 and h > 0, the derivative is 1.

The left-hand limit is 0 while the right-hand limit is 1. For the derivative to exist at x = 0, the left-hand and right-hand limits must be equal, but this is not the case for ReLU. As a result, the derivative of ReLU does not exist at x = 0, making it not differentiable.

This raises the question, how come ReLU is still widely used as an activation function in machine learning when its derivative is not defined at x = 0? The reason for this is that we don’t care that the derivative of ReLU is not defined at x = 0. When this occurs, we simply set the derivative to 0 (or any arbitrary value) and move on with our computations. This is why ReLU can still be used in conjunction with Gradient Descent.

In conclusion, the ReLU function is continuous, but it is not differentiable. Despite this, ReLU can still be used as an activation function in machine learning by setting its derivative to 0 (or any arbitrary value) when it is undefined. This simple hack allows us to continue using ReLU together with Gradient Descent and other machine learning algorithms. Understanding the properties of ReLU will help practitioners make informed decisions when choosing activation functions for their models.

--

--