Combating overfitting with L1 and L2 regularization

Published in

featurepreneur

3 min readMay 8, 2023

What is overfitting?

Overfitting is a very common problem in machine learning where a model performs impressively on training data but performs poorly on new, unseen data. This is because the model has learned patterns in the training data which don’t generalize to the whole population.

How to deal with overfitting?

Use simpler models
Collect more data
Use cross-validation
Early stopping
Regularization

What is Regularization?

Regularization is the technique in which the model parameters are constrained or regularized, which reduces the risk of overfitting. It works by adding a penalty to the cost function the model is trying to minimize. By doing so, regularization forces the model to pay more attention to the overall structure of the model, rather than fitting the training data as closely as possible. The two common types of regularization are L1(Lasso) and L2(Ridge).

What is L1 Regularization?

L1 Regularization, also known as Lasso regularization, adds a penalty to the cost function that is proportional to the absolute value of the model weights.

This results in a sparse model in which many weights are exactly zero. This can be helpful when the number of features is large and only a subset of important ones is to be selected. L1 regularization is also robust to noise in data, as it is less sensitive to small, outlying values.

L1 regularization is commonly used with linear models, such as linear regression. L1 regularization is sensitive to the scaling of data, hence it is a good idea to standardize the features before using L1 regularization.

What is L2 Regularization?

L2 Regularization, also known as Ridge regularization, adds a penalty to the cost function proportional to the square of the model weights.

This results in a model with small, non-zero weights. L2 regularization is sensitive to outliers. hence outliers should be handled before using L2 regularization.

L2 Regularization is commonly used with linear models, such as linear regression. Like L1 regularization, L2 regularization is also sensitive to the scaling of data.

Implementation of Regularization using Scikit-learn

import pandas as pd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error

bos = load_boston()
df = pd.DataFrame(bos.data, columns = bos.feature_names)
df['PRICE'] = bos.target

x_train, x_test, y_train, y_test = train_test_split(df.drop("PRICE",axis=1),df["PRICE"],test_size=.2)

lreg = LinearRegression()
lreg.fit(x_train,y_train)
lreg_y_pred = lreg.predict(x_test)

lrid = Ridge()
lrid.fit(x_train,y_train)
lrid_y_pred = lrid.predict(x_test)

lasso = Lasso()
lasso.fit(x_train,y_train)
lasso_y_pred = lasso.predict(x_test)

fig, ax = plt.subplots(figsize=(20,7))
x = np.arange(13)
ax.bar(x-0.2, lreg.coef_, width=0.2, color='b', align='center',label="Linear Regression")
ax.bar(x_train.columns, lrid.coef_, width=0.2, color='g', align='center', label="Ridge Regression")
ax.bar(x+0.2, lasso.coef_, width=0.2, color='r', align='center',label="Lasso Regression")
ax.spines['bottom'].set_position('zero')
plt.style.use('ggplot')
plt.legend()
plt.show()

Coefficients for linear, ridge, and lasso regressions

What are the disadvantages of Ridge Regularization?

It may not perform well when there are a lot of features.
Doesn’t perform feature selection — it keeps all the features in the model and only reduces the magnitude of their coefficients.

What are the disadvantages of Lasso Regularization?

Eliminates some of the features by setting their coefficients to zero, potentially excluding important features.
It may not perform well when there is multicollinearity between features.