MACHINE LEARNING

What is an Activation Function in a Neural Network?

Learn what an activation function is, why you need it, and how to choose the best activation function for your machine learning project

Chris Verdence
Writers’ Blokke

--

Photo by Uriel SC on Unsplash

Neural network algorithms learn from data. This happens through training, which usually includes multiple iterations of forward and backward pass. A forward pass is done to predict the output based on the given input data, while a backward pass is used to minimize an error function and make the algorithm predict better during the next iteration.

An illustration of a simple neural network

In the above illustration the uppermost node in the hidden layer will, during forward pass, be calculated by applying an activation function to the sum of the upper and lower node in the input layer multiplied with their respective weights. By being applied to each node in the network, the activation function determines whether a node should be activated or not based on its relevancy for the model’s prediction.

An activation function is non-linear, while the product of a node value and the corresponding weight is linear. Multiple layers of just linear transformation will only be able to yield a linear transformation, and there are many problems that can’t be solved by such functions. Without applying an activation function we would effectively be limiting the quality of the functions the neural network is able to predict. By introducing a non-linear activation function, more complex functions can be predicted, and the algorithm’s ability to predict accurately will be enhanced.

Two of the most used activation functions are the Sigmoid and the rectified linear unit (ReLU). The Sigmoid function has a range between 0 and 1, with the greatest increase being found around x = 0. Jumps in the predicted output is avoided due to its smooth gradient, but unfortunately the function has a vanishing gradient for low and high x values making the training process rather slow. The ReLU function has become more popular as machine learning practitioners have started to train larger and larger amounts of data. This is because it is very computationally efficient to use the ReLU since its gradient is always a constant.

[1] MissingLink. Neural Network Concepts: 7 Types of Neural Network Activation Functions: How to Choose?.

--

--

Chris Verdence
Writers’ Blokke

The product development guy | Giving my take on going from zero to one