Some Regularization Techniques in Deep Learning

3 min readJan 20, 2023

In this blog post, I will explain the mechanics, advantages and disadvantages of the following regularization techniques:

L1 regularization
L2 regularization
Dropout
Data Augmentation
Early Stopping

What’s Regularization ?

Regularization is a process that corrects ill-posed problems and prevent overfitting. It makes small modifications so that the model generalizes better.

Regularization penalizes the weight matrices of the nodes.

If we have high regularization coefficient weight matrices are nearly equal to zero, so we obtain a linear network that underfitting the training data.

In contrary, large value of the regularization coefficient may overfitting the data. This not allow to generalize the model and don’t work with other data set.

The aim is to optimize the value of regularization coefficient in order to obtain a well-fitted model.

L1 regularization

L1 and L2 are the most common types of regularization. These update the general cost function by adding another term known as the regularization term.

L1 regularization term penalize the sum of absolute values of the weights. It is useful when we try to compress our model because it generates a simple and interpretable model and it is robust to outliers values.

Term of L1 regularization added to cost function

L2 regularization

L2 regularization term penalize the sum of square values of the weights.

Term of L2 regularization added to cost function

The values of weight matrices decay towards zero, which reduces model overfitting and leads to simpler models. It is able to learn complex data set, but it is not robust to outliers.

Dropout

At every iteration, it randomly selects some nodes and removes them.

It is useful when we have a large neural network structure in order to introduce more randomness.

Data Augmentation

To reduce overfitting, we increase the size of the training data. This technique is usefull when we works with images because we just have to do some rotation, scaling, shifting, etc. This improve the accuracy of the model. This technique can’t be use on every data set because it to costly to increase data.

Early stopping

In this technique, we are searching for the minimum cost function. We will stop training when the model start overfitting the data, when the testing error increase and the performance on the validation set is getting worse. The hyperparameter patience is use to tell the model after how many iterations, without seen improvement, it should stop.

After the dotted line, we have higher value of validation error, so the model will stop after the number of patience iterations.