Convolutional Neural Network — Lesson 9: Activation Functions in CNNs

Machine Learning in Plain English
2 min readJun 21, 2023

--

The Need for Non-Linearity: ReLU, Leaky ReLU, etc.

Activation functions introduce non-linearity into the model, allowing it to learn and perform complex tasks. Without them, no matter how many layers we stack in the network, it would still behave as a single-layer perceptron because the composition of linear functions is a linear function.

Some common activation functions used in CNNs include:

ReLU (Rectified Linear Unit): This is the most commonly used activation function in CNNs. It returns 0 if it receives any negative input, but for any positive value x, it returns that value back. Hence, it can be written as f(x) = max(0, x). The function is non-linear, which means the output is not proportional to the input. It helps to alleviate the vanishing gradient problem.

Leaky ReLU: Leaky ReLU is a variant of ReLU. Instead of being 0 when x < 0, a leaky ReLU allows a small, non-zero, constant gradient α (Normally, α=0.01). Hence, the function could be written as f(x)=max(αx,x). It mitigates the dying ReLU problem which refers to the problem when the ReLU neurons become inactive and only output 0 for any input.

Where to Apply Activation Functions in CNNs

In a CNN, activation functions are typically applied after each convolutional layer and fully connected layer. However, they are not applied after pooling layers. The purpose of using activation functions after the convolutional and fully connected layers is to introduce non-linearity into the model after performing linear operations (convolution and matrix multiplication).

In essence, activation functions serve as the “switch” in artificial neurons that decide whether that neuron should be activated or not based on the weighted sum of the input. This reflects how neurons in the human brain work: they either fire, or they don’t. This biological analogy helps to conceptualize the role of activation functions in a CNN.

--

--