What are L1 and L2 Regularization in Machine Learning and How to Apply Them in Python?

Ronak vala
4 min readAug 2, 2023

--

What is Regularization in Machine Learning?

Regularization is a technique used in machine learning to prevent overfitting and improve the generalization ability of a model. It is particularly common in Linear Regression and other Linear models. Overfitting occurs when a model performs extremely well on training data but fails to generalize to new or unseen data.

What is L2 Regularization?

L2 Regularization is also known as Ridge Regularization. It works by adding a penalty term to the loss function during the training process. The penalty is based on the square d magnitude of the model’s coefficients, The larger the penalty, which encourage the model to prefer smaller coefficient values.

L2 Regularization Mathematical equation

Here, λ is the regularization parameter, a hyperparameter that controls the strength of the regularization. A higher value of λ results in strong regularization and θ is the model coefficient.

By Penalizing Large values, L2 regularization discourages the model from relying too heavily on individual features and encourages it to use all features more evenly.

What is L1 Regularization?

L1 Regularization also known as Lasso Regularization, is another technique in machine learning to prevent overfitting and improve generalization abilities of a model, Like L2 Regularization.

L1 Regularization adds a penalty term to the loss function (SSE) during training, However, the penalty in L1 Regularization is based on the absolute magnitude of the model’s weight rather than squared magnitudes.

L1 Regularization mathematical equation

The main difference between L1 and L2 regularization lies in the effect they have on the model’s coefficient.

  1. Sparsity:-L1 Regularization tends to push some of the model’s coefficients to exactly zero, effectively creating a sparse model. In other words, It selects and keeps only the most important features, Discarding less relevant or redundant ones. This is beneficial for feature selection as it simplifies the model and can improve interpretability.
  2. Feature Selection:-L2 Regularization encourages small but non-zero values for all model coefficients. It doesn’t force any of them to be exactly zero, so it keeps all features in the model.

The combination of both L1 Regularization and L2 Regularization is known as Elastic Net Regularization.

Implementation Of L1 and L2 Regularization

Here we are finding the Price of a House in a Melbourne city using a Melbourne house prediction dataset.Here In dataset there are 21 Features so we train our model on machine learning algorithm called Linear Regression. But the model give good accuracy on training data and perform very poorly on testing data. so it indicate that the linear regression model is overfitting.

Reading the data using panda
Trained data on Linear Regression

Here we split the data into train and test set,70% data will be in the training set and 30% data is on the test set then we apply a Linear Regression model and we can see that on testing set that is unseen data it give 13% accuracy that is very poor and on a training set it give 68% score so, it is good compare to testing data, so it indicate that model is overfitting.

Now we are Implementing L1 & L2 Regularization to prevent this overfitting

Implementation of L1 Regularization

The Lasso or L1 regression is specified with a regularization parameter (alpha) of 50, a maximum number of iterations for optimization (max_iter) set to 100, and a tolerance level (tol) of 0.1. The model is then trained on the training data (X_train) and corresponding target values (y_train). The training score obtained is approximately 0.67, After that, the model is evaluated on the test data (X_test, y_test), yielding a test score of around 0.664.

Implementation of L2 Regularization

Implementation of L2 or Ridge regularization

Ridge regression model is created using scikit-learn’s Ridge module. The Ridge regression is specified with a regularization parameter (alpha) of 50, a maximum number of iterations for optimization (max_iter) set to 100, and a tolerance level (tol) of 0.1. The model is then trained on the training data (X_train) and corresponding target values (y_train). The obtained test score is approximately 0.667, indicating a decent fit to the unseen test data, while the training score is approximately 0.662.

detailed analysis and implementation of the models by exploring the Jupyter Notebook and dataset available at GitHub [L1 and L2 Regularization].

--

--