Regularization Methods

4 min readJun 1, 2019

Regularization helps to overcoming a problem of overfitting a model.Overfitting is the concept of balancing of bias and variance. If it is overfitting , model will have less accuracy.When our model tries to learn more property from data, then noise from the training data is added.Here noise means the data point that don’t really present the true property of your data.

It regularizes or shrinks the coefficients estimates towards zero.

In above plot , you can understand this in better way.

Different Regularization Technique:

L1 and L2 Regularization
Dropout
Data Augmentation
Early stopping

Regularization penalizes the coefficients . In deep learning it actually penalizes the weight matrices of the node.

L1 and L2 Regularization:

It is the most common type of regularization.In regression model , L1 regularization is called Lasso Regression and L2 is called Ridge Regression.

These update the general cost function with another term as regularization.

Cost function = Loss ( cross entropy) + regularization

In Machine Learning:

In machine learning this fitting process involves loss function as RSS (Residual sum of square).

Lasso ( L1 Normalization)

Ridge (L2 Normalization)

‘y’ represent the learned relation and ‘β’ represents the coefficient estimates for different variables or predictors(x). λ is the tuning parameter that decides how much we want to penalize the flexibility of our model.

The difference in these two are penalty term.Ridge adds square magnitude of coefficient as penalty term to the loss function. Lasso (Least Absolute Shrinkage and Selection Operator) adds absolute value of magnitude of coefficient.

In case of the huge number of feature in data set, so for feature selection , the Lasso shrinks the less important feature’s coefficient to zero.

Dropout

It is most frequently used regularization technique in the field of deep learning.At every iteration, dropout select some node and drop that along with all incoming and outgoing connections.So each iteration have different set of node with output.In machine learning this is called ensemble that have better performance as they capture more randomness.

Data Augmentation

The simplest way to reduce overfitting is to increase the size of the training data. In machine learning, we were not able to increase the size of training data as the labeled data was too costly.

But, now let’s consider we are dealing with images. there are a few ways of increasing the size of the training data — rotating the image, flipping, scaling, shifting, etc.

This technique is known as data augmentation. This usually provides a big leap in improving the accuracy of the model. It can be considered as a mandatory trick in order to improve our predictions.

Early stopping

Early stopping is a kind of cross-validation strategy where we keep one part of the training set as the validation set. When we see that the performance on the validation set is getting worse, we immediately stop the training on the model. This is known as early stopping.

In the above image, we will stop training at the dotted line since after that our model will start overfitting on the training data.

Below is the python pseudo code for all above methods function implementation.

Above code is an example python code for implementation ,you can change the variable name according to your data set and modify the code based on your preference and you can implement your own regularization method.

Happy Learning…