Unlocking Neural Networks: The Secret Sauce of Deep Learning

Gorule Vishal Vilas
4 min readJun 5, 2024

--

Activation functions are fundamental to the functioning of neural networks in deep learning. They introduce non-linearity into the network, enabling it to learn complex patterns and make accurate predictions. In this blog, we will focus on three popular activation functions: Sigmoid, Tanh, and ReLU. We will explore their mathematical formulations, graphical representations, practical applications, and their respective advantages and disadvantages.

Introduction to Activation Functions

Activation functions determine whether a neuron should be activated by computing the weighted sum of inputs and adding a bias. The function’s output is then used as the input for the next layer in the network. By applying an activation function, neural networks can model complex non-linear relationships in the data.

Sigmoid Function

The sigmoid function is one of the earliest activation functions used in neural networks. It maps any input value to an output range between 0 and 1.

Formula:

Graph:

Sigmoid Function and its derivative

Advantages:

  • Smooth Gradient: The sigmoid function has a smooth gradient, which helps in gradient-based optimization.
  • Probability Interpretation: Outputs values between 0 and 1, making it suitable for binary classification problems.

Disadvantages:

  • Vanishing Gradient Problem: For very high or low input values, the gradient of the sigmoid function becomes very small, slowing down the training process.
  • Non-Zero-Centered Output: The outputs are not zero-centered, which can lead to inefficient gradient updates during training.

Hyperbolic Tangent (Tanh) Function

The tanh function is similar to the sigmoid function but maps input values to an output range between -1 and 1.

Formula:

Graph:

Hyperbolic Tangent (Tanh) Function

Advantages:

  • Zero-Centered Output: Unlike the sigmoid function, tanh outputs are zero-centered, which helps with efficient gradient updates.
  • Steeper Gradient: The gradients are steeper than those of the sigmoid function, which can lead to faster convergence.

Disadvantages:

  • Vanishing Gradient Problem: Similar to the sigmoid function, tanh can also suffer from the vanishing gradient problem for very high or low input values.

Rectified Linear Unit (ReLU) Function

The ReLU function is currently the most widely used activation function in deep learning. It outputs the input directly if it is positive; otherwise, it outputs zero.

Formula:

Graph:

Advantages:

  • Efficient Computation: ReLU is computationally efficient and easy to implement, leading to faster training times.
  • Mitigates Vanishing Gradient Problem: Provides a constant gradient for positive inputs, which helps mitigate the vanishing gradient problem.

Disadvantages:

  • Dying ReLU Problem: If a large number of neurons output zero (due to negative inputs), these neurons can become inactive and stop learning.

Practical Example: Image Classification

Consider a deep learning model designed to classify images of handwritten digits from the MNIST dataset. The model consists of several layers, each utilizing different activation functions.

  1. Input Layer: Raw pixel values of the images.
  2. Hidden Layers: Uses ReLU to introduce non-linearity and learn complex features.
  3. Output Layer: Uses Sigmoid or Softmax to output probabilities for each digit class (0–9).

Code Example (Using TensorFlow/Keras):

Output:

Conclusion

Activation functions are pivotal in deep learning, enabling neural networks to capture non-linear patterns and make accurate predictions. Each activation function — Sigmoid, Tanh, and ReLU — has its strengths and weaknesses, making them suitable for different types of problems and network architectures. Understanding these functions and their applications can significantly enhance the performance of your deep learning models.

--

--

Gorule Vishal Vilas

🌟🔍🖼️ Passionate Computer Vision Specialist | Deep Learning Enthusiast | Unleashing AI's Potential | Artificial Intelligence And Data Science Graduate