Regularization in Machine Learning

One of the key problems every machine learning model faces is the problem of over-fitting. So what is over-fitting and how do we minimize it? What is Regularization? By the end of the article, you will be clear with these concepts.

To understand these concepts we will have to answer the following questions

  1. What is over-fitting?
  2. What is Regularization?
  3. Types of Regularization

1. What is over-fitting?

Over-fitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.

The simple explanation for the above definition is that over-fitting is when a model takes almost every feature into account. When this happens, the model is likely to “memorize’ the features. Over-fitting is also known as Variance. The green line in the below image represents over-fitting.

Overfitting Example
Image by Wikipedia

One of the solutions to over-fitting is Regularization.

2. What is Regularization?

The formal definition of regularization is as follows

This is a form of regression, that constrains/ regularizes or shrinks the coefficient estimates towards zero.

In simple terms, regularization is a technique that takes all the features into account but limits the effect of those features on the model’s output. Let us understand this through an example.

Let us take the example of housing prices using Linear Regression. This problem might have a lot of features to consider. Let’s say it has about 100 features for the sake of simplicity. In this problem, the model will try to consider all the features to give an output. This will eventually lead to the model “memorizing” the features in the dataset. As a result, the model will perform well in the training set but will perform very poorly in the testing set since the data is new to it.

This is where regularization plays a vital role. It makes sure that the model takes the features into account but it uses a hyperparameter “regularization constant” lambda to limit the effect of those features on the output and to prevent the model from over-fitting.

3. Types of Regularization

There are 3 types of Regularization. We will cover the first two in this article

  1. l1 regularization
  2. l2 regularization
  3. dropout regularization

A model that uses l1 regularization is called Lasso regression. Lasso regression (Least Absolute Shrinkage and Selection Operator) adds the absolute value of the magnitude of coefficient as penalty term to the loss function.

The value of lambda must be balanced. A very small value will lead back to an OLS (Ordinary Least Square) and a very large value will drive the coefficients to zero. Hence the model will under-fit.

Image from StackOverflow

A model that uses l2 regularization is called Ridge regression. It is one of the more widely used techniques. This technique adds the “squared magnitude” of the coefficient as the penalty to the loss function. Here the value of lambda should be chosen appropriately just like l1 regularization. A small value of lambda will lead to OLS and a large value will lead to an under-fitting issue.

Image from StackOverflow

Note: The main difference between these two techniques is the penalty term.

One key point to note about l1 regularization (or) Lasso regression is that it shrinks the less important features to zero making it extremely useful for feature selection.

To conclude there are several other methods to address over-fitting but the above-discussed techniques work well for large datasets.



An enthusiastic learner and a budding writer!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store