Neural Networks and Activation Functions

Published in

AI³ | Theory, Practice, Business

3 min readSep 2, 2019

Activation functions are an extremely important feature of artificial neural networks. They basically decide whether a neuron should be activated or not. What, however, does it mean for a neuron to be activated and what role does it play in the neural network?

Neural networks have neurons that work in correspondence of weight, bias and their respective activation function. In a neural network, we would update the weights and biases of the neurons on the basis of the error at the output.

Moreover, a neural network consists of 3 types of different layers. The Input Layer accepts input features. It provides information from the outside world to the network, no computation is performed at this layer and nodes only pass on the information(features) to the next layer, the hidden layer. The Hidden Layer receives the information features passed on through the input layer. Nodes of this layer are not exposed to the outer world. The Hidden layer performs all sort of computation on the features entered through the input layer and transfer the result to the output layer. The Output Layer brings up the information learned by the network to the outer world.

Activation functions make the back-propagation possible since the gradients are supplied along with the error to update the weights and biases. An activation function is also known as a Transfer Function.

The Activation Functions can be basically divided into 2 types-

1. Linear Activation Function

2. Non-linear Activation Functions

The Linear Activation Functions simply scales an input by a factor, implying that there is a linear relationship between the inputs and the output. The issue with a Linear Activation Function is that it behaves just like a linear regression model. It doesn’t matter how many hidden layers we attach in a neural net, all layers will behave the same way because the composition of two linear functions is a linear function itself. A neuron cannot learn with just a linear function attached to it.

Therefore, and not surprisingly, non-linear Activation Functions are the most used activation functions.

There are four especially prominent non-linear Activation Functions

1. Sigmoid Activation Function.

The Sigmoid Function curve looks like an S-shape. It is especially used for models where we have to predict the probability as an output.

2. Tanh Activation Function.

The Tanh Activation Function is actually a mathematically shifted version of the sigmoid function. Both are similar and can be derived from each other.

3. ReLU (Rectified Linear Unit) Activation Function.

The ReLU is the most used activation function in the world right now. Since it is used in almost all the convolutional neural networks or deep learning. The issue with ReLU is that it turns all the negative values into zero immediately which decreases the ability of the model to fit or train from the data properly. That means any negative input given to the ReLU activation function turns the value into zero immediately, which in turns results that negative values are not mapped appropriately.

In case of choosing the right activation function the basic rule of thumb is if you really don’t know what activation function to use, then simply use RELU as it is a general activation function. If your output is for binary classification then, the sigmoid function is a very natural choice for the output layer.

Neural Networks and Activation Functions

Written by John Kaller