Deep Learning Activation Functions & their mathematical implementation.

Published in

Nerd For Tech

5 min readMay 21, 2021

Activation functions, also known as the Transfer functions are critical in designing neural networks. Activation functions are important in a sense as it is used to determine the output of the neural network. It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function). There is another name for the activation function known as the Squashing function, this name is used when the scope of the activation function is confined. Activation functions are applied to each node of the neural network and determine whether if the neuron should be ‘fired’/’activated’ or not.

Why careful choice is important while selecting an activation function.

Choice of an activation function is very critical when implemented in hidden and output layers. Accuracy and loss of a model are very dependent on the activation function. Also, they must be chosen based on what you expect your model to perform. for instance, in a binary classification problem sigmoid function is an optimum choice.

Types of Activation Function.

Activation Functions can be broadly divided into two categories:

Linear activation functions.
Non-Linear activation functions.

libraries to import

import math as m
import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

Sigmoid function

The sigmoid activation function is also known as the logistic function. sigmoid functions are very popular in regression classification problems. The sigmoid function gives value in the range of 0 and 1.

Code to implement:

def sigmoid(x):
    return 1 / (1 + m.exp(-x))values_of_sigmoid = []
values_of_x = []
for i in range(-500,500,1):
    i = i*0.01
    values_of_x.append(i)
    values_of_sigmoid.append(sigmoid(i))plt.plot( values_of_x ,values_of_sigmoid)
plt.xlabel("values of x")
plt.ylabel("value of sigmoid")

2. tanH function

This function is very similar to the sigmoid activation function. The function takes any real value as input and outputs values in the range -1 to 1. The larger the input (more positive), the closer the output value will be to 1.0, whereas the smaller the input (more negative), the closer the output will be to -1.0. The Tanh activation function is calculated as follows.

Code to implement:

def tanh(x):
    return (m.exp(x) - m.exp(-x)) / (m.exp(x) + m.exp(-x))values_of_tanh = []
values_of_x = []
for i in range(-500,500,1):
    i = i*0.001
    values_of_x.append(i)
    values_of_tanh.append(tanh(i))plt.plot( values_of_x ,values_of_tanh)
plt.xlabel("values of x")
plt.ylabel("value of tanh")

3. Softmax function

Softmax activation function outputs a vector of values that sum to 1.0 that can be interpreted as probabilities of class membership. Softmax is a “softer” version of argmax function that allows a probability-like output of a winner-take-all function.

Code to implement:

def softmax(x):
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()values_of_x = [i*0.01 for i in range(-500,500)]
plt.plot(scores ,softmax(values_of_x))
plt.xlabel("values of x")
plt.ylabel("value of softmax")

4. Rectified Linear Unit Function

The ReLU or Rectified Linear Activation Function is perhaps the most common function used for hidden layers. It is also effective at overcoming the limitations of other previously popular activation functions, such as Sigmoid and Tanh. Specifically, it is less susceptible to vanishing gradient descent problem that prevents deep models from being trained, although it can suffer from other problems like saturated units.

Code to implement:

def ReLU(x):
    return max(0,x)values_of_relu = []
values_of_x = []
for i in range(-500,500,1):
    i = i*0.01
    values_of_x.append(i)
    values_of_relu.append(ReLU(i))plt.plot(values_of_x,values_of_relu)

5. Leaky ReLU

issues with Rectified Linear Unit: when a negative value is given to the ReLU, it become zero immediately which decreases the ability of the model to fit or train from the data properly. That means any negative input given to the ReLU activation function turns the value into zero immediately in the graph, which in turns affects the resulting graph by not mapping the negative values appropriately.

To overcome this problem Leaky ReLU was introduced.

Code to implement:

def leaky_ReLU(x):
    return max(0.1*x,x)values_of_L_relu = []
values_of_x = []
for i in range(-500,500,1):
    i = i*0.01
    values_of_x.append(i)
    values_of_L_relu.append(leaky_ReLU(i))plt.plot(values_of_x,values_of_L_relu)

6. Some other Activation functions and implementations:

6.1 Exponential Linear Unit:

Code to implement:

activation_elu = layers.Activation(‘elu’)x = tf.linspace(-3.0, 3.0, 100)
y = activation_elu(x) # once created, a layer is callable just like a functionplt.figure(dpi=100)
plt.plot(x, y)
plt.xlim(-3, 3)
plt.xlabel(“Input”)
plt.ylabel(“Output”)
plt.show()

6.2 Scaled Exponential Linear Unit:

Code to Implement:

activation_selu = layers.Activation('selu')x = tf.linspace(-3.0, 3.0, 100)
y = activation_selu(x) # once created, a layer is callable just like a functionplt.figure(dpi=100)
plt.plot(x, y)
plt.xlim(-3, 3)
plt.xlabel("Input")
plt.ylabel("Output")
plt.show()

6.3 Swish:

activation_swish = layers.Activation(‘swish’)x = tf.linspace(-3.0, 3.0, 100)
y = activation_swish(x) # once created, a layer is callable just like a functionplt.figure(dpi=100)
plt.plot(x, y)
plt.xlim(-3, 3)
plt.xlabel(“Input”)
plt.ylabel(“Output”)
plt.show()

Hidden Layer Activation Functions:

It is generally supported that Recurrent Neural Networks use Tanh or sigmoid activation functions, or even both. For example, the LSTM commonly uses the Sigmoid activation for recurrent connections and the Tanh activation for output.

1.Multilayer Perceptron (MLP): ReLU activation function.

2.Convolutional Neural Network (CNN): ReLU activation function.

3.Recurrent Neural Network: Tanh and/or Sigmoid activation function.

Well, if you are unsure about which activation function to use you can surely try out the different combination and lookout for the best fit.

Output Layer Activation Functions:

Output layer activation functions must be chosen on the basis of what kind of problem you are solving. for example, if you have a linear regression problem, then a linear activation function will be helpful. Here are some common problems you might face and the activation functions used.

Binary Classification: One node, sigmoid activation.
Multiclass Classification: One node per class, softmax activation.
Multilabel Classification: One node per class, sigmoid activation.

Deep Learning Activation Functions & their mathematical implementation.

Written by Rishabh sharma