Activation Functions Used in Neural Networks

Ankita Choudhury
Jan 15 · 4 min read

An activation function plays a vital role in neural networks. It is also called a transfer function. Its aim is to introduce non-linear transformation to learn the complex underlying patterns in the data. It should be differentiable as well as should follow computational inexpensiveness. And its output needs to be zero centered so that it would help in the calculated gradients to be in the same direction and shifting across.

The activation function is represented as f(x) where x=(input*weights)+bias. Now, let’s look into commonly used activation function.

Sigmoid Function

The sigmoid function can be defined as

  1. It scales the value between 0 and 1.
  2. It has an S-shaped curve.
  3. It is centered on 0.5.
  4. It's Differentiable and Monotonic.
  5. It is also known as a Logistic function.

Tanh Function

The tanh function can be defined as

  1. It scales value between -1 to +1.
  2. It also resembles an S-shaped curve.
  3. It is centered on 0.
  4. It’s Differentiable and Monotonic.

Rectified Linear Unit Function

The ReLU function is expressed as

  1. It is a piecewise function.
  2. It returns zero when the value of x is less than zero otherwise returns one.
  3. Its snag of being zero for all negative values is a problem called Dying ReLU.

Leaky ReLU Function

This function is expressed as

  1. It is a variant of the ReLU function.
  2. It has a small slope for all negative values (α).
  3. α is mostly 0.01.
  4. Parametric ReLU Function:- Here, the parameter is sent to a neural network and that network learns the optimal value of α.
  5. Randomized ReLU Function:- Here, the random value of a is set.

Exponential Linear Unit Function

This function can be expressed as follows

  1. It is similar to the Leaky ReLU function.
  2. It has a small slope for negative values.
  3. It is centered on zero.

Swish Function

The swish function is expressed as

  1. It was introduced by Google.
  2. It performs better than ReLU.
  3. It is non-monotonic.
  4. α(x) is a sigmoid function.
  5. It can be reparametrize as below.

Softmax Function

This function can be defined as

  1. It is a generalization of the sigmoid function.
  2. Mostly applied to the final layer of the network and in multi-class classification tasks.
  3. The sum of softmax values is always one.
  4. It converts their inputs t probabilities as shown below.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Ankita Choudhury

Written by

Data Science Enthusiast

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store