Power of Activation Functions: A Beginner’s Guide to Neural Network Building Blocks

Dedeepya Lekkala
5 min readAug 10, 2023

--

Over the last sixty years, the field of machine learning, a subset of artificial intelligence, has experienced rapid growth. Research in this field has gained momentum, extending its reach into various dimensions of human existence.

Machine learning is a field of study that uses the statistics and computer science principles, to create statistical models, that predict and infer outcomes. These models are sets of mathematical relationships between the inputs and outputs of a given system. The learning process is the process of estimating the models parameters such that the model can perform the specified task .For machines, this learning process give the ability to learn without being programmed explicitly.

Components of Artificial Intelligence

At the heart of machine learning lies Artificial Neural Networks (ANN), computer programs inspired by the workings of the human brain. These ANNs are called networks because they are composed of different functions , which gathers knowledge by detecting the relationships and patterns in data using past experiences known as training examples.These networks are capable of learning from data, making predictions, and solving complex tasks.

Neural Network

Now, let’s explore a fundamental question: Why do neural networks need activation functions?”

Imagine you’re navigating a maze, but you can only move in straight lines. No turns, no twists — just forward and backward. That’s what neural networks without activation functions would be like — limited to linear movements in the world of data.Neural Network without an activation functions acts as a Linear Regression Model with limited performance.But in reality, data is full of twists, turns, and complex relationships. That’s where activation functions step in.

Picture a painter being forced to use a straightedge for every curve — the result would be far from reality. Similarly, without activation functions, neural networks can only capture straight-line patterns, leaving the complex details of data unexplored.It’s like trying to paint a rainbow with just one color.

THE NEED FOR NON-LINEARITY IN NEURAL NETWORKS

Those functions which have degree more than one and have a curvature when plotted are known as Non-linear functions. Neural Networks are also known as Universal Function Approximators which, means that they can compute and learn any function provided to them. Any imaginable process can be represented as a functional computation in Neural Networks.

Thus, we need to apply an activation function to make the network dynamic and add the ability to it to extract complex and complicated information from data .Hence, by adding non linearity with the help of non-linear activation functions to the network, we are able to achieve non-linear mappings from inputs to outputs.

Mathematical Magic of Activation Functions:

Now that we’ve known the importance of non-linearity and the role activation functions play in neural networks, it’s time to delve into the mathematical core of these functions.

Imagine you have a dataset with input features represented by the vector X. The linear model computes the output f(x) by combining these features with their respective weights, and then adding a bias term.

Mathematically, this can be represented as f(x) = w^T x + b

  • f(x) is the output of the linear model for a given input x.
  • w is the weight vector, signifying the importance of each input feature.
  • w^T denotes the transpose of the weight vector, allowing for the dot product with the input vector x.
  • x represents the input feature vector.
  • b is the bias term, providing an extra constant to the output.

Visually, this can be visualized as a straight line or plane in the input feature space.

However, linear models are limited to capturing simple relationships between variables. They can’t handle complex patterns or complex data relationships that often exist in the real world. That’s where activation functions come into play.

The Activation Functions are transfer functions that are applied to the outputs of the linear models to convert these linear inputs to non-linear outputs. The non-linear output after the application of the Activation Function is given by y = α(w1 x1 + w2 x2 + .. + wn xn + b) where α is the activation function.

Typical Biological Neuron

To visualize this transformation, consider a typical deep learning model which shows the three layers that make up a DL based system with some focus on the positions of activation functions, represented by the dark shaded region in the respective blocks.

Block Diagram of Deep Learning based System

The input layer accepts the data for training the neural network which comes in various formats from images, videos, texts, speech, sounds, or numeric data, while the hidden layers detect the local the patterns and features in data from the previous layers.The output layer presents the network classifications or predictions, with associated probabilities

The location of an Activation Function (AF) within a network structure depends on its role. When placed after the hidden layers, the AF converts learned linear mappings into non-linear forms for further propagation. In the output layer, it helps in making predictions.

Types of Activation Functions

Some of the main important Activation Functions are

  1. Binary Step Function
  2. Linear
  3. Sigmoid
  4. Tanh
  5. ReLU
  6. Leaky ReLU
  7. Parametrized ReLU
  8. Exponential Linear Unit
  9. Swish
  10. SoftMax

Choosing the Right Activation Function

When it comes to choosing the right activation function for a neural network,there is no thumb rule for selecting any activation function but the choice of activation function is context dependent, i.e it depends on the task that is to be accomplished.

Different Activation Functions have both advantages and dis-advantages of their own and it depends on the type of system that we are designing.

  • For classification problems, a combination of sigmoid functions gives better results.
  • Due to vanishing gradient problem i.e. gradient reaching the value zero, sigmoid and tanh functions are avoided.
  • ReLU function is the most widely used function and performs better than other activation functions in most of the cases.
  • If there are dead neurons in your network, then you can use the leaky ReLU function.
  • If there are dead neurons in your network, then you can use the Leaky ReLU function.
  • ReLU function has to be used only in the hidden layers and not in the outer layer.

Think of neural networks as trying out various brushstrokes to craft a beautiful painting. Explore various activation functions and see how they affect your model’s performance. The key is to find the perfect combination that makes your network shine.

In my upcoming post, we’ll take a closer look at each of the different activation functions, diving into their mathematical expressions and understanding how they influence neural network behavior.

--

--

Dedeepya Lekkala

Computer Vision Enthusiast | Deep learning | Machine learning | Autonomous Driving