Activation Functions: ReLU & Softmax
If you’ve spent some time implementing Deep Learning models, you’ve most likely realized some of the common denominators between any given set of Deep Learning experiments. One of these motifs are Activation Functions! Depending on the problem you are trying to solve, you’ll be tasked with selecting best-suited Activation Function for your neural network’s architecture. By the end of this article, you should be able to do the following:
- Explain what Activation Functions are in layman-terms and describe their role in Artificial Neural Networks.
- Understand how to implement both Rectified Linear Unit (ReLU) & Softmax Activation Functions in Python.
Activation Functions:
From a biological perspective, the activation function an abstract representation of the action-potential rate in a neuron. In the world of Deep Learning and Artificial Neural Networks, Activation Functions can be viewed as a set of rules that determine whether a neuron activates / “fires” or not, given an input or set of inputs. This is better understood with a tangible example, so let’s look at of one of the most popular activation functions; the Rectified Linear Unit( ReLU). A neuron with a ReLU Activation Function takes in any real values as its input(s), but only activates when these input(s) are greater than 0. A graph of the ReLU activation function can be found below.
Let’s assume the input list contains all the inputs in our neuron.
It’s that simple. From all our input values, we only activate when the input is greater than 0. You may have noticed that the ReLU function resembles the function y = x, and it is technically the same function….. well, kind of.
We can say that ReLU is the “positive argument” of the function y = x. Lastly, the code above can be cleaned up a bit more for readbility, so let’s do that.
Voila! There you have it. You’ve written your first activation function from scratch.
The Softmax Activation function maps non-normalized inputs into a set exponentiated and normalized probabilities. In the context of Machine Learning, the Softmax activation function is used in multi-class classification problems to generalize logistic regression when there are more than two outcome classes. Let’s take a look at how to implement this in Python in a few easy steps.
We need to make a minor adjustment to our implementation due to some problems that are customary to Neural Networks — Dead Neurons and Exploding Values. To mitigate these issues, we can simply subtract the maximum value in the output list from every output in the list before exponentiation. The catch behind this is that when we subtract the maximum output-value from all the outputs, we are left with a list of outputs where no output is greater than 0. The cool trick behind this property is that when we exponentiate 0 we get 1. When we exponentiate negative-infinity, we get a number that is essentially 0. You can now see how this property has effectively helped us map our outputs into a normalized scale between 0 and 1. Also, note that subtracting the same value from each output doesn’t change the probabilities. I’ll leave that to you to experiment with different values of subtraction and see if there is a change in the probability distribution. So let’s piece this all together in Python code.
Done! As you can see, your probabilities didn’t change with the subtraction of the maximum value from each value.
You should now have a good understanding of what Activation Functions are, how they fit in the context of Artificial Neural Networks, and how to implement some of the most commonly used activation functions in Python. If you made it this far, thank you for taking your time to read this article, and I hope you were able to learn something new!