Types Of Activation Functions used in Neural Network and how to choose?

Prafull Nayan
Analytics Vidhya
Published in
5 min readAug 17, 2020

Note

In this article I have discussed the various types of activation functions and what are the types of problems one might encounter while using each of them.

I would suggest to begin with a ReLU function and explore other functions as you move further. You can also design your own activation functions giving a non-linearity component to your network.

Recall that inputs x0,x1,x2,x3……xn and weights w0,w1,w2,w3……..wn are multiplied and added with bias term to form our input.

Recall that inputs x0,x1,x2,x3……xn and weights w0,w1,w2,w3……..wn are multiplied and added with bias term to form our input.

Clearly W implies how much weight or strength we want to give our incoming input and we can think b as an offset value, making x*w have to reach an offset value before having an effect.

As far we have seen the inputs so now what is activation function?

Activation function is used to set the boundaries for the overall output value.For Example:-let z=X*w+b be the output of the previous layer then it will be sent to the activation function for limit it’svalue between 0 and 1(if binary classification problem).

Finally, the output from the activation function moves to the next hidden layer and the same process is repeated. This forward movement of information is known as the forward propagation.

What if the output generated is far away from the actual value? Using the output from the forward propagation, error is calculated. Based on this error value, the weights and biases of the neurons are updated. This process is known as back-propagation.

A neural network without an activation function is essentially just a linear regression model.

Some Activation Functions

  1. Step Function
if value of z<0,output=0,if value of z>0,output=1
  • This sort of function is for classification however this activation function is less used because this is very strong function as the small changes are not reflected.

2. Sigmoid Function

  • The next activation function that we are going to look at is the Sigmoid function. It is one of the most widely used non-linear activation function. Sigmoid transforms the values between the range 0 and 1.
  • A noteworthy point here is that unlike the binary step and linear functions, sigmoid is a non-linear function. This essentially means -when I have multiple neurons having sigmoid function as their activation function,the output is non linear as well.

3.Hyperbolic Tangent(tanh(z))

  • The tanh function is very similar to the sigmoid function. The only difference is that it is symmetric around the origin. The range of values in this case is from -1 to 1. Thus the inputs to the next layers will not always be of the same sign.

4. Rectified Linear Unit(Relu)

  • This is actually a relative simple funcion max(0,z).
def relu_function(x):
if x<0:
return 0
else:
return x
  • Relu has been found to have very good performance ,especially when dealing with the issue of Vanishing Gradient.

5. Leaky Rectified Linear Unit

Leaky ReLU function is nothing but an improved version of the ReLU function. As we saw that for the ReLU function, the gradient is 0 for x<0, which would deactivate the neurons in that region.

Leaky ReLU is defined to address this problem. Instead of defining the Relu function as 0 for negative values of x, we define it as an extremely small linear component of x. Here is the mathematical expression-

f(x)={ 0.01x, x<0
x, x>=0}

6.Softmax Function

Softmax function is often described as a combination of multiple sigmoids. We know that sigmoid returns values between 0 and 1, which can be treated as probabilities of a data point belonging to a particular class. Thus sigmoid is widely used for binary classification problems.

for i=1,2,3,4……k ,(k=no of categories)
  • Softamax function calculates the probablities distribution of the event over k different events.
  • So,this means this function will calculate the probablities of each target over all possible targets.
def softmax_function(x):
z = np.exp(x)
z_ = z/z.sum()
return z_

Choosing the right Activation Function

Now that we have seen so many activation functions, we need some logic / heuristics to know which activation function should be used in which situation. Good or bad — there is no rule of thumb.

However depending upon the properties of the problem we might be able to make a better choice for easy and quicker convergence of the network.

  • Sigmoid functions and their combinations generally work better in the case of classifiers
  • Sigmoids and tanh functions are sometimes avoided due to the vanishing gradient problem
  • ReLU function is a general activation function and is used in most cases these days
  • If we encounter a case of dead neurons in our networks the leaky ReLU function is the best choice
  • Always keep in mind that ReLU function should only be used in the hidden layers
  • As a rule of thumb, you can begin with using ReLU function and then move over to other activation functions in case ReLU doesn’t provide with optimum results.

--

--

Prafull Nayan
Analytics Vidhya

Machine Learning||Deep Learning|| NLP ||Computer Vision Enthusiast