Lasso Regression — In ‘Simple’ English

Yash Gupta
Data Science Simplified
7 min readSep 29, 2022

Lasso Regression might be one of the biggest underdogs in the list of ML algorithms. Lasso Regression is not as well known as simple linear regression or logistic regression but is very useful and good to know. Without any background knowledge about the math involved with the Lasso Regression techniques, if you want to know what it is all about, let’s get to it in less than 10 minutes!

Note: This article will give you an understanding of Lasso Regression without any code or the mathematics involved, though giving the formulas to show where the referencing of a concept lies. If you are looking for resources that help you write out the entire code for Lasso Regression or to mathematically understand it, there are links at the end of the article that will help you with the same.

Prerequisites

What are some other types of Regression you should see before you learn Lasso Regression?

Before you dive deeper into Lasso Regression, you should give some time to brush up or go through the following regressions first or at least skim through them, so that this article makes more sense to you.

  • Linear Regression
  • Logistic Regression
  • Ridge Regression

What is Lasso Regression?

Lasso Regression is an acronym in the term LASSO which stands for;

L: Least
A: Absolute
S: Shrinkage and
S: Selection
O: Operator

LASSO Regression is a slight modification of the Ridge Regression.

RIDGE REGRESSION RECAP!

Psst!

Ridge Regression is a modified version of regression where instead of the relationship between points in the least squares method to fit the data, in a case where you don’t want the regression equation to overfit the training data and want to penalize your equation to adjust for a squared value of your parameters at a penalization value of lambda or λ that can take a value from 0 to infinity.

The penalization, therefore, makes your model a little less accurate on training data but it helps your model become a lot more effective in the long run to adjust for the testing data.

When should you use Lasso Regression?

Top 5 conditions you need to check if you want to choose between regular linear regression or LASSO Regression

  • If you are overfitting your training data by a lot
  • If you have a lot of variables in your data that may or may not be useful to the model
  • If your regular regression model is failing to give you a good score on the testing data no matter what you do
  • If you find that your data could do better with a little less overfitting to improve performance in the long run
  • If you don’t completely understand the variables and using all the variables in regression is the only way you see you can proceed.

The Intuition behind Lasso Regression?

We’ll break down this section into three segments to eventually get to what LASSO Regression really is on the backend.

We’ll touch briefly upon Linear Regression and Ridge Regression to finally get to Lasso Regression and to understand why there was a need for this model and how it can outperform the other two in a given scenario.

1. Simple Linear Regression:

Simple Linear Regression runs on the intuition that you can define a linear relationship between two variables in such a way that using the Least Squares method, you would want the line to fit the data points in a way that the distance between the line and the points at any given data point is the least.

If you’ve been in touch with Linear Regression, you know it is a line, that has a slope, an intercept, an independent variable, and a dependent variable (in simple linear regression).

The cost function in the OLS Method or the Ordinary Least Squares method is defined as the residual distance between the point and the line and minimizing it will give you a model that best fits the data and given all things remain the same, it can predict the value of the independent variable for a given value of the dependent variable.

For an example-based definition, check this out:

2. Ridge Regression:

If you notice that your data is being overfitted by your linear regression line, what would you do?

You could either put in more training data that is not in line with your pre-existent data and you could try to make the model a little less sensitive to the training data. The best way to go about this without adding in more data is by reducing the sensitivity your model has to the training data by penalizing its overfitting to the parameters involved.

This happens by increasing the cost a little by adding in a little something that is referred to in mathematical terms as Lambda or λ.

In Ridge Regression, it is calculated on the squared value of the slope on each parameter. The higher the lambda, the more the penalty is assigned and the lesser the sensitivity the model has to the training data.

We’ll dive deeper into this in another article but here’s something for you to use if you want to learn this from scratch (StatQuest to the rescue!):

3. Lasso Regression:

Okay, let’s now get to the elephant in the room.

Consider the same Ridge Regression idea of penalties when you’re overfitting your data, but let’s consider that you have too many variables.

The idea that the penalty takes a squared value of the slope (lambda times) in Ridge Regression, does not allow the penalty to reflect 0 for a column that is useless. For example, the number of pushups a person can do in a model that is working on predicting the same person’s salary is not necessarily relevant (unless he is a fitness coach or so).

In this case, it would be ideal to eliminate such a variable from your model and not consider it in your penalty calculation.

Here’s where Lasso Regression shines. The difference is that the Lasso Regression model does not take the squared value of the slope but the absolute value itself. The absolute value can reach 0 and can be eliminated from the model’s penalty itself making it lesser sensitive to training data and also accounting for the high number of variables in your data and using only those which are actually helpful.

For anyone who understands the math and FYI;

The rest is the same as in Ridge Regression — The higher the lambda, the higher the penalty. Hence, this makes Lasso Regression one of the biggest underdogs in the world of Machine Learning algorithms.

In a nutshell, The big difference between Lasso and Ridge Regression?

  • Lasso Regression deals with the penalty — not on the squared value of the parameter, but the absolute value.
  • So essentially meaning that your data can have 20 variables but you can exclude the ones that don’t impact your regression equation from the penalty (because the absolute penalty can take a value of 0, which is not possible in Ridge Regression)
  • Lasso Regression is good when you want to penalize your data to not overfit the training data and also if you want to exclude some unnecessary variables from your model. (If you only want to avoid overfitting and your model has only the required variables)

Conclusion:

Learning how to perform different kinds of regression on your data depending on what suits your data best will help you make better models for your work.

With conditions like too many variables in the picture, the kind of problem at hand, and the problems of underfitting and overfitting, it is important to know where you need to take your linear regression model up a notch and move to Polynomial, multivariate, ridge, lasso or other kinds of regression models.

Try using Lasso Regression in your use cases and see if your model improves its performance. (Best way to learn is to practice!)

Let me know in the comments below if you have any other pointers or charts that everyone should look into. Leave a clap and follow to stay in touch with any new articles and to support the blog!

For more such articles, stay tuned with us as we chart out paths on understanding data and coding and demystify other concepts related to Data Science. Please leave a review down in the comments.

Check out my other articles at:

Do connect with me on LinkedIn at — Yash Gupta — if you want to discuss it further! Leave a clap and comment below to support the blog! Follow for more.

P.S. Thanks a lot to Josh Starmer for helping us understand how Statistics works and in the simplest of ways!

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss