Understanding Regularization Algorithms

Anuj Vyas
Analytics Vidhya
Published in
4 min readSep 9, 2020

Understanding the use of Regularization algorithms like LASSO, Ridge, and Elastic-Net regression.

Pre-requisite

Before directly jumping into this article make sure you know the maths behind the Linear Regression algorithm. If you don’t, follow this article through!

Introduction

Table of Contents

  • What is Regularization?
  • What are the different Regularization algorithms?
  • Working of LASSO, Ridge, and Elastic-Net Regression
  • What does Regularization achieve?

What is Regularization?

Regularization is a technique used in regression to reduce the complexity of the model and to shrink the coefficients of the independent features.

“Everything should be made as simple as possible, but no simpler.” -Albert Einstein

In simple words, this technique converts a complex model into a simpler one, so as to avoid the risk of overfitting and shrinks the coefficients, for lesser computational cost.

What are the different Regularization algorithms?

  • Ridge Regression
  • LASSO (Least Absolute Shrinkage and Selection Operator) Regression
  • Elastic-Net Regression

Working of Ridge, LASSO, and Elastic-Net Regression

The working of all these algorithms is quite similar to that of Linear Regression, it’s just the loss function that keeps on changing!

Loss Function for Linear Regression

Ridge Regression

Ridge regression is a method for analyzing data that suffer from multi-collinearity.

Loss Function for Ridge Regression

Ridge regression adds a penalty (L2 penalty) to the loss function that is equivalent to the square of the magnitude of the coefficients.

The regularization parameter (λ) regularizes the coefficients such that if the coefficients take large values, the loss function is penalized.

  • λ → 0, the penalty term has no effect, and the estimates produced by ridge regression will be equal to least-squares i.e. the loss function resembles the loss function of the Linear Regression algorithm. Hence, a lower value of λ will resemble a model close to the Linear regression model.
  • λ → ∞, the impact of the shrinkage penalty grows, and the ridge regression coefficient estimates will approach zero (coefficients are close to zero, but not zero).

Note: Ridge regression is also known as the L2 Regularization.

To sum up, Ridge regression shrinks the coefficients as it helps to reduce the model complexity and multi-collinearity.

Ridge Regression: Coefficient values if λ = 0.5, 5 and 10 respectively || Source

LASSO Regression

LASSO is a regression analysis method that performs both feature selection and regularization in order to enhance the prediction accuracy of the model.

Loss Function for LASSO Regression

LASSO regression adds a penalty (L1 penalty) to the loss function that is equivalent to the magnitude of the coefficients.

In LASSO regression, the penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the regularization parameter λ is sufficiently large.

Note: LASSO regression is also known as the L1 Regularization (L1 penalty).

To sum up, LASSO regression converts coefficients of less important features to zero, which indeed helps in feature selection, and it shrinks the coefficients of remaining features to reduce the model complexity, hence avoiding overfitting.

LASSO Regression: Coefficient values if λ = 0.05, and 0.5 respectively || Source

Elastic-Net Regression

Elastic-Net is a regularized regression method that linearly combines the L1 and L2 penalties of the LASSO and Ridge methods respectively.

Loss Function for Elastic-Net Regression

What does Regularization achieve?

A standard least-squares model tends to have some variance in it i.e. the model won’t generalize well for a data set different than its training data. Regularization, significantly reduces the variance of the model, without a substantial increase in its bias.

So the regularization parameter λ, used in the techniques described above, controls the impact on bias and variance. As the value of λ rises, it reduces the value of coefficients and thus reducing the variance. This increase in λ is beneficial as it is only reducing the variance (hence avoiding overfitting), without losing any important properties in the data. But after a certain value, the model starts losing important properties, giving rise to bias in the model and thus underfits the data. Therefore, the value of λ should be carefully selected.

This is all the basic you will need, to get started with Regularization. It is a useful technique that can help in improving the accuracy of your regression models.

References

Connect me on

If you learned something from this blog, make sure you give it a 👏🏼
Will meet you in some other blog, till then Peace ✌🏼

--

--

Anuj Vyas
Analytics Vidhya

• A budding Machine Learning Engineer • Final Year IT Engineering Student • Self-taught