Activation Function in Neural Network

Sumeet Agrawal
6 min readSep 1, 2021

--

Introduction

We used a variety of activation functions in deep learning. Activation functions are the most important aspect of any neural network in deep learning. Image classification, language translation, object recognition, and other complex tasks in deep learning require the use of neural networks and activation functions to solve. As a result, these tasks are extremely difficult to handle without it.

In general, the neural network activation functions, are the most important component of Deep Learning. They are used to determine the output of deep learning models, the performance efficiency of the training model, and its accuracy, which can design or split a massive neural network.

The Activation functions have a significant impact on the ability and speed with which neural networks converge. Let’s continue reading this blog to understand the activation function, its types & their importance, and their limitations.

What is the Activation Function?

The Activation Function is a non linear modification used to each neuron in a hidden layer to get output in a normalized form (or in a certain range or scale).

To produce the desired outcome, they simply determine whether to deactivate or activate neurons. It also applies a nonlinear adjustment to the input in order to improve the performance of a sophisticated neural network.

The activation function also aids in the normalization of any input’s output in the range between 1 to -1. Because the neural network is occasionally trained on millions of data points, the activation function must be efficient and should reduce the computation time.

In any neural network,the activation function basically determines whether a given input or receiving information is meaningful or irrelevant.To further understand what a neuron is and how the activation function restricts the output value, consider the following example.

Image Source — https://machinelearningknowledge.ai

Why do we use the Activation Function?

  • To normalize data, we employ the activation function .
  • To obtain nonlinearity between data in a Feed-Forward Network (FFN), we execute a nonlinearization operation on linear data

They essentially decide whether a neuron should be activated or not, as well as the information received by neuron is relevant to the provided information or should be ignored.

Without the activation function, weight and bias would only have a linear transformation, and a neural network would be nothing more than a linear regression model. A linear equation is a one degree polynomial that is simple to solve but limited in its ability to solve complex problems or higher degree polynomials.

The Activation Functions can be classified into two categories:-

  1. Linear Activation Function
  2. Non-linear Activation Functions

1. Linear or Identity Activation Function

The function is a line or linear, as you can see. As a result, the functions’ output will be unconstrained by any range.

Equation : f(x) = x

Range : (-infinity to infinity)

Image Source — https://www.geeksforgeeks.org/understanding-activation-functions-in-depth/

It doesn’t help with the complexities of numerous parameters in the data that is normally provided to neural networks.

2. Non-linear Activation Function

The most commonly utilised activation functions are nonlinear activation functions. Nonlinearity contributes to the appearance of this graph.

Image Source — https://www.geeksforgeeks.org/understanding-activation-functions-in-depth/

It allows the model to generalise or adapt to a wide range of data while also distinguishing between the outputs.

The following are the most important nonlinear function terms to know:

Derivative or Differential: Change in y-axis w.r.t. change in x-axis.It is also known as slope.

Monotonic function: A function which is either completely non-increasing or completely non-decreasing.

Different Activation Functions are -

  1. Binary Step
  2. Linear
  3. Sigmoid
  4. Tanh
  5. ReLU
  6. LeakyReLU

1. Binary Step

Image source — https://arshren.medium.com/neural-networks-activation-functions

This is a pretty basic activation function that comes to mind whenever we try to bind output. It’s essentially a threshold-based classifier in which we choose a threshold value to determine whether a neuron should be activated or deactivated as an output. We set the threshold value to 0 in this case. Classifying binary problems or classifiers is relatively simple and useful.

2. Linear Activation Function

It’s a straightforward activation function in which the weighted sum of neurons or input is precisely proportional to our function. A positive sloped line may boost the firing rate as the input rate rises, while linear activation functions provide a larger range of activations.

A neuron is either firing or not firing in binary. If you’re familiar with gradient descent in deep learning, you’ll notice that the derivative in this function is constant.

Image Source — https://www.i2tutorials.com/activation-functions-in-neural-networks/

Equation : f(x) = x

Range : (-infinity to infinity)

3. Sigmoid Activation Function

The Sigmoid Function curve looks like S-shape.

Sigmoid Function

Output range is between 0 and 1 i.e. [0,1]

Derivative of Sigmoid Function

Output range is between 0 and 0.25

Image Source — https://saugatbhattarai.com.np/what-is-activation-functions-in-neural-network-nn/

The sigmoid function is used because it exists between two points (0 to 1). As a result, it is particularly useful in models where the probability must be predicted as an output. Because the likelihood of anything only occurs between 0 and 1, sigmoid is the best option.

It is possible to differentiate the function. That means we can calculate the sigmoid curve’s slope at any two points. Although the function is monotonic, its derivative is not.

Problems of Sigmoid Function are-

Vanishing gradient

Computationally expensive

The output is not zero centered

The softmax function is a multiclass classification function that is more generalised than the logistic activation function.

4. TanH (hyperbolic tangent) Activation Function

TanH is also like logistic sigmoid but better. The range of the tanh function is from (-1 to 1).

TanH Function
Derivative of TanH
Image Source — https://dothanhblog.wordpress.com/2020/02/19/machine-learning

Advantages of TanH function

  • We can find the differentiation from this function.
  • It’s a function that’s centered around zero.
  • In comparison to sigmoid, optimization is simple.

Disadvantages of TanH function

  • Because it is a computationally intensive function, the
  • conversion will take a long time.

•Vanishing gradients

5. ReLU Activation Function

Right now, the ReLU is the most widely utilized activation function on the planet. It’s used in practically all convolutional neural networks and deep learning algorithms.

ReLU Function
Image Source — activation functions you should know in Deep Learning

Range: [ 0 to infinity)

Advantages —

  • We can find a differential.
  • Solve the problem of vanishing gradients.
  • Because there is no exponential calculation here, the calculation is faster than sigmoid or tanh.

Disadvantages —

  • It is not a zero-centric function.
  • It is fully inactive for -ve input.

6. Leaky ReLU Activation Function

The Leaky ReLU function is superior to the ReLU activation function. It has all of the features of ReLU and will never suffer from the Dying ReLU problem. The term “leaky ReLU” is defined as follows:

Equation of Leaky ReLU Function
Image Source — https://paperswithcode.com/method/leaky-relu

Dying ReLU

Some neurons effectively die during training, which means they stop producing anything other than zero. In some circumstances, especially if you utilized a high learning rate, you may discover that half of your network’s neurons are dead. When the weights of a neuron are changed to the point where the weighted sum of its inputs is negative for all instances in the training set, the neuron dies. When this happens, it simply continues to output 0s, and gradient descent has no effect because the ReLU function’s gradient is 0 when its input is negative.

This is all about the activation functions. Hopefully, you have now understood the activation functions. Also, I have written a blog on “Introduction to Deep Learning”.

--

--