Why do we need Activation Functions in Neural Network?

4 min readJan 30, 2020

An Activation function is defined as a function which provides an output, given an input or a set of inputs. Before we dive into the necessities of an activation function in a neural network, let us understand what do we actually mean by Neural Networks.

What do you mean by Neural Networks?

We can think of a Neural Network as a system that takes one or more than one input, processes them to provide one or more outputs. The neural networks consist of multiple units known as neurons that are grouped into several layers where neurons of one layer are linked to other neurons of next layers through weighted connections. One neuron takes the value of a connected neuron and multiplies it with their connective weights. The sum of all connected weights of neurons along with the bias value gives us f(x) or Activation function that transforms the value mathematically using the formula:

The result so obtained can then be passed onto the next neuron. These weighted inputs are then propagated through the entire network. But the real deal with neural networks is to find the right weights to compute the right results, which is done using a wide range of techniques.

Process of Classification in Neural Network

How does Activation Function work?

Let’s consider an Activation Function ‘Sigmoid’. Sigmoid takes in an input and if the input is a negative number, it transfers its value to a number very close to 0. Similarly if any input is a positive number, its value gets transferred to a number very close to 1. Alternatively, if the input is close to 0, its value gets transferred to a value between 0 and 1. Hence, for Sigmoid, 0 happens to be the lower limit whereas 1 happens to be the upper limit.

BUT WHAT’S THE INTUITION BEHIND IT?

An Activation function is biologically inspired by activities inside our brain wherein different neurons get activated by different stimulus. For instance, if we smell something delicious, certain neurons gets fired or activated which helps us to sense the smell or taste. On the contrary, if we smell something awful or unpleasant, the same neurons do not get activated, instead another set of neurons do. So briefly, “getting activated” can be represented as 1 and “not getting activated” can be represented as 0. From previous understandings we conclude that, closer the value to 0, lesser is its activation tendency and vice versa.

On the other hand, ‘ReLU’ or ‘Rectified Linear Unit’ transforms each output to a maximum of either 0 or the number itself.

f(x) = max(0, x)

Thus if the input is less than 0, ReLU will transform it to 0 and if its greater than 0, it will simply output the given input.

Now, going back to where we started;

Why do we need Non-linear neural networks? We can say that Activation functions are functions that impart non-linearity to the network and is one of the factors that affect our results and accuracy of our model. For our Neural networks to provide outstanding results, we need to consider non-linear activation into account. In case, we use Neural Networks with several hidden layers, a linear activation function fails to provide any satisfactory output or it will simply generate a series of affine transformations ( according to sources, which means “any transformation that preserves co-linearity i.e., all points lying on a line initially still lie on a line after transformation” ) so we might as well not even have any hidden layers. Thus, this model is no longer expressive than a Standard Logistic Regression model. Unless we impart non-linearity, we’re not computing interesting models even if we go deeper into neural networks.

Architecture of complete model using non-linear neural network

In Conclusion,

Non-linear Neural networks are those with degrees more than one and generate curves when plotted. Activation functions are so used so that neural networks can not only understand these functions but also perform on more complicated, highly-versatile, large datasets whose model have uncountable hidden layers. In other words, activation functions if not used, neural networks will only produce linear regression models instead of producing interesting trained outputs.

Why do we need Activation Functions in Neural Network?

f(x) = max(0, x)

Written by Torsa Talukdar