Deep Learning Course — Lesson 10.6: Gradient Clipping

Machine Learning in Plain English
2 min readJun 13, 2023

--

Gradient clipping is a technique used to prevent the so-called “exploding gradients” problem in neural networks, which can occur when large error gradients accumulate and result in very large updates to neural network model weights during training.

At an extreme, the values of weights can become so large as to overflow and result in NaN values, often referred to as an “exploding” network. More commonly, the network can become unstable, and the loss value used to train the network can oscillate during training, causing the training process to fail.

This is particularly problematic for recurrent neural networks (RNNs), like LSTMs and GRUs, when trained on long sequences, as the accumulation of gradients through many time steps can be very large.

The concept behind gradient clipping is pretty simple: by limiting (or “clipping”) the maximum value of the gradient, we can prevent the problems caused by large gradients without significantly impacting the learning process.

Gradient clipping comes in two main types:

  1. Value Clipping: This involves clipping the gradients when their absolute value exceeds a predefined threshold. The result is that the gradient vector’s direction may be changed.
  2. Norm Clipping: This involves scaling the whole gradient if the L2 norm of the gradient vector exceeds a certain threshold. This method preserves the direction of the gradient and is generally the preferred method for gradient clipping.

To implement gradient clipping, we need to choose the threshold value. A common heuristic is to first train the model without gradient clipping and examine the typical gradient values, then set the threshold a bit higher than that. However, the threshold may need to be tuned.

Gradient clipping can be applied with most gradient descent optimization algorithms. It has been a critical component for training successful deep learning models, such as LSTM networks, and is an important tool in the toolbox of any machine learning practitioner who trains deep learning models.

Please note that while gradient clipping is an important technique to control the exploding gradient problem, it does not help with the vanishing gradient problem, which is a separate issue commonly faced in deep learning models.

--

--