Regularization for Machine Learning in Python

Conor Moloney
3 min readJul 23, 2020

--

The aim of this post is to provide a beginner friendly introduction to Regularization in Python and to show straight-forward examples of how the technique can be implemented. The code for this project is available here: https://github.com/ConorMoloney96/BostonHousingPricesPrediction

Regularization is an important technique for any machine learning or data science professional to understand. To appreciate why it’s so important we first have to understand a central concept in ML: the bias-variance tradeoff (visualized below). High bias (also called underfitting) means that the model has not been able to identify signal in the data in order to make accurate predictions. The most popular sollution to this is to train the model more. However lowering the bias by training the model more extensively on the training data can lead to an increase in variance (also called overfitting). This means that the model will perform well on training data but not generalize well to previously unseen test data. This makes the model essentially useless for practical applications. As a general rule of thumb more complex models are more prone to overfitting while less complex models are more prone to underfittin.

Overfitting vs. Underfitting

Regularization is a valuable technique for preventing overfitting. Regularization essentially penalizes overly complex models during training, encouraging a learning algorithm to produce a less complex model. 3 types of regularization are Ridge (L1), Lasso (L2) and Elastic Net. Ridge regression shrinks/constrains the coefficients of a model towards zero, discouraging an algorithm from producing more complex models.

Ridge Regression Formula

Lasso Regression is similar to Ridge but where Ridge shrinks coefficients towards zero without ever setting them to zero Lasso shrinks coefficients all the way to zero. This can be useful in preventing overfitting for the same reasons Ridge Regression is useful but has an added benefit of essentially removing unimportant features from consideration (a form of feature engineering) which makes the model far easier to interpret and explain to stakeholders (an important points in almost any data science or analytics project).

Lasso Regression Formula

Elastic Net Regression is a combination of Lasso and Ridge Regression, with the parameter r used to control the mix ratio. All 3 methods can be easily implemented in Python using the sklearn.linear_model library. A code extract below shows how this can be done:

Code snippet

Much of my knowledge about regularization came from the book “An Introduction to Statistical Learning: With applications in R” by James et al (although I obviously prefer Python to R). I’d recommend taking a look if you’d like to cover the topic in more depth.

--

--

Conor Moloney

MSc Business Analytics student at UCD with an interest in Machine Learning, Big Data systems and Graph theory.