Ridge and Lasso Regression

Intro

Chris Fiorentine
4 min readSep 24, 2020

Recently my class has been covering topics of regression and classification. We are now able to use the data that we have to make predictions, analyze the data better, and draw significant conclusions. We are able to do this by building models to predict data and see the most important features influencing a target variable. There are many ways to build these models depending on the data that is given and what you are trying to draw from the data. For example if you are trying to predict a continuous variable, you are going to want to use a linear regression model. But if you are trying to predict a classified variable, you are going to want to use something like a decision tree or a logistic regression model. There are also many other models to use that I wont get into in this post. In this post, I will be going over 2 key tools that will help your model predict data it hasn’t seen before. These are called Ridge and Lasso.

Regularization

Regularization is a technique that will help a model perform on data that it has not seen before. This is very important to do if the model is overfit to the data the model is fit too. Regularization will also help the bias and variance tradeoff which is very important to a having a good model. This can also be helpful if you do not have as many data points as you would like, regularizing the model will help generalize the model rather than over-fitting to the data you have. The two most common ways to fix these problems and regularize your data are Ridge and Lasso.

An example of under-fitting and over-fitting a model
An example of getting the best Bias-Variance tradeoff

Ridge Regression

Now we will get into Ridge regression. In Ridge regression, we are going to fit a new line to our data to help regularize our overfit model. This may cause the training error of your model to increase, but it will help the model perform on unseen data. By doing this, we are introducing bias in our model for a better tradeoff. This small increase in bias will have a big drop in variance in an over fit model and will help us in the long term. Now for the math that behind changing the model. In linear regression, Ridge regression penalizes the sum of the squared residuals + lambda * the slope². Lambda is what you determine how large you want your penalty to be. So the higher the lambda the more regularized the model will be. We can use cross validation to find the best lambda.

Ridge Formula

Fitting a ridge regression in the simplest for is shown below where alpha is the lambda we can change.

ridge = Ridge(alpha=1)
ridge.fit(X_train, y_train)

Ridge regression will help you choose the best features of the model because it will minimize the features that do not have a large effect on the target variable, therefore using them less in the final model. As you can see Ridge regression can be very helpful to over fit data and help regularize a model.

Lasso Regression

Similar to Ridge regression, Lasso regression also helps regularize a model and can be very helpful to a model predicting on unseen data. However, Lasso regression is slightly different than ridge regression. Lets start with the formula. Instead of using a penalty that squared the slope, Lasso will take a similar equation but take the absolute value of the slope rather than square it. It will take the sum of the squared residuals + lambda * the absolute value of the slope. In Lasso, instead of just penalizing the less important features when using Ridge, Lasso may reduce the features to have zero impact on the final model. This will leave a much simpler model than when we started. This can be very helpful in interpreting the model.

Conclusion

Ridge and Lasso regression are very helpful when trying to regularize a model. The difference in them are important to note. Ridge regression will be better to use when there are a lot of features that are important in the model as it will penalize them, but not drop the less important ones. Lasso will be better used when there you are only trying to use the most important features in your model. Both of these models will greatly help your understanding of the model and will help the model perform better on unseen data. Thanks for reading!

Some Video links to help

--

--

Chris Fiorentine

Graduate of Flatiron School Data Science Immersive Bootcamp