Activation functions in Neural Networks

Santhosh Kannan
featurepreneur
Published in
5 min readJan 3, 2023

What is an Activation Function?

An activation function decides whether a neuron’s input is important to the neural network or not in the output prediction. It’s main function is to transform the summed input of the node into an output value that will be fed to the next layer.

Activation function

Why is an Activation Function needed?

Activation function adds non-linearity to the neural network. Without activation function, a neuron performs only linear transformation on the inputs using the weights and bias. Thus our model will be just a linear regression model and will not be able to solve complex problems

What are the different activation functions?

Linear activation function

Linear activation function, also called the identity function or no activation, multiplies the weighted sum of inputs by 1. Thus it doesn’t transform the input and the output is same as the input.

Linear activation function

Limitations
1. Not possible to use backpropagation as derivative of the function is constant.
2. All layers of the neural network will collapse into one. Even if there are 100 layers, the last layer will be a linear function of the first layer, essentially turning the network into just 1 layer.

Binary step function

Binary step function decides whether the output is 0 or 1 based on a threshold value. If the weighted sum is greater than threshold, it outputs 1 else 0.

Binary step function

Limitations
1. Cannot be used for multi-class classification problems
2. Hinders backpropagation process as the gradient is zero

Logistic Activation function

Logistic Activation function transforms the input into a value between 0 and 1. The larger the input(more positive), the closer the output will be to 1 and the smaller the input(more negative), the closer the output will be to 0.

Logistic Activation function

Advantages
1. Commonly used in models where probability is the output.
2. Prevents jumps in output values since the function has a smooth gradient.

Limitations
1. Suffers from Vanishing Gradient Problem where the network is unable to backpropagate useful information

Tanh Function

Tanh function or hyperbolic tangent function is similar to logistic function with the only main difference being that the output of tanh function is between -1 and 1. The larger the input(more positive), the closer the output will be to 1 and the smaller the input(more negative), the closer the output will be to -1.

Tanh activation function

Advantages
1. Output can be mapped as strongly negative, neutral and strongly positive
2. Has gradient 4 times greater than that of logistic function — thus giving rise to bigger learning steps when training.
3. Symmetric around 0, leading to faster convergence.

Limitations
1. Suffers from Vanishing Gradient Problem where the network is unable to backpropagate useful information

Rectified Linear Unit(ReLU)

Rectified Linear Unit transforms the input into 0 if it is negative or returns the input itself if it is positive. Although it seems like a linear function, ReLU has a derivative function and allows complex relationships in the data to be learned.

Rectified Linear Unit

Advantages
1. Doesn’t activate neurons with negative inputs, thus being computationally efficient.
2. Tends to show better convergence
3. Faster to compute than some other activation functions like logistic function

Limitations
1. Tends to blow up activation since there is no constraint to the output if the input is positive.
2. Dying ReLU problem — if too many activation get below zero, most of the neurons in a layer will output zero, creating dead neurons whose weights and biases are not updated.

Leaky ReLU function

Leaky ReLU is an improved version of ReLU function to solve the Dying ReLU problem. Instead of transforming negative values into 0 like ReLU, Leaky ReLU transforms it by multiplying with a small, non-zero constant parameter a(Normally 0.01).

Leaky ReLU activation function

Advantages
1. Prevention of Dying ReLU problem by allowing a small gradient for negative inputs
2. Faster to compute than some other activation functions like logistic function

Limitations
1. Sensitive to the parameter a — A value that is too small may result in slow convergence, while a value that is too large may result in unstable behaviour
2. Prediction for negative input values may not be consistent

How to choose the right activation function?

Choosing the right activation function is an important decision in the design of a neural network, as it can significantly impact the network’s performance. Some general guidelines for choosing an activation function are

  1. The characteristics of the data and the requirements of the task: The logistic function may be more suitable for tasks that involve binary classification, while the ReLU function may be more suitable for tasks that involve large, positive input values.
  2. The computational complexity: Activation functions with higher computational complexity may require more time and resources to compute, which can impact the overall performance of the network.
  3. The type of layer: ReLU activation function is mostly used in the hidden layers whereas Logistic and Tanh functions are mostly used in output layers.
  4. Trail and error: It is often a good idea to try out different activation functions and compare their performance on the specific task at hand. This can help to identify the best activation function for the task.

--

--