Activation Functions in NN

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

3 min readMay 6, 2024

Activation functions are used to fit the output in a certain range . Different functions fix it in different ranges . It can be from [0,1] or [-1 ,1] . In simple terms it is used to “activate” the input after calculation with weights and biases to a certain output to a value in the range.

There is Linear and Non-linear activation functions:

-> Linear Activation function (Identity function)

Also known as “No Activation” . The activation is proportional to the input. The function doesn’t do anything to the weighted sum of the input. It just spits out the value it is given.

-> Non Linear Activation Functions :

1) Sigmoid function

This function looks like an S shaped curve.The range is [0,1].

If the network or the model has to predict something , i.e the output is a Yes or No (1 or 0) we use the Sigmoid activation function . The sigmoid function is differentiable , which means we can find the slope of the curve from any 2 points.

2) Tanh activation function ( hyperbolic tangent)

This is also an S shaped function and very similar to the Sigmoid function but the range is [-1,1].

Used in cases where you need 3 cases of output negative , neutral and positive : -1,0,1

3) ReLU

The rectified linear unit activation is pretty straightforward . For every x>0 f(x) = x and for every x ≤ 0 f(x) = 0 . The differential f’(x) will return 1 and 0. It is differentiable for every point except at x = 0.

4) Leaky ReLU

It is an improved version of ReLU.

In ReLU for all values of x < 0 f(x) = 0 .
In leaky ReLU for x<0 f(x) = 0.1 * x .
This is done so that for all values less than 0 it recieves a really small output as well rather than it just being 0. (addressing the disadvatage in normal ReLU).

5) Softmax activation function

It is similar to a logistic function like the sigmoid function. It is basically a combination of multiple sigmoids. It is usually used in the final layer .

It is used for multiclass classification — as it allows the output to be a probability distribution for multiple classes

References:
https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6
https://builtin.com/machine-learning/sigmoid-activation-function
https://www.mygreatlearning.com/blog/relu-activation-function/#:~:text=Leaky%20ReLU%20activation%20function,-Leaky%20ReLU%20function&text=f(x)%3Dmax(0.01,for%20negative%20values%20as%20well.
https://www.geeksforgeeks.org/activation-functions/