L1 and L2 Regularization Methods, Explained From Scratch

L1 and L2 regularization are the best ways to manage overfitting and perform feature selection when you’ve got a large set of features.

Shubham Koli
4 min readJan 13, 2023

In machine learning, regularization is a technique used to prevent overfitting by adding a penalty term to the loss function. There are different types of regularization, but the two most common ones are L1 and L2 regularization.

L1 regularization (LASSO regression) :-

L1 regularization, also known as Lasso regularization, adds a penalty term to the loss function that is proportional to the absolute value of the weights. This results in some weights being shrunk to zero, effectively performing feature selection. L1 regularization is particularly useful when the number of features is much larger than the number of samples, as it helps to reduce the dimensionality of the problem.

produces sparse matrices. Sparse matrices are zero-matrices in which some elements are ones (the sparsity refers to the ones), but in this context a sparse matrix could be several close-to-zero values and other larger values. From the data science point of view this is interesting because we can reduce the amount of features.

If we find a model with neurons whose weights are close to zero it means we don’t need those neurons because the model deactivates them with zeros and we might not need a specific feature/input leading to a simpler model. For instance, if we have 50 coefficients but only 10 are non-zero, the other 40 are irrelevant to make our predictions. This is not only interesting from the efficiency point of view but also from the economic point of view: gathering data and extracting its features might be a very expensive task (in terms of time and money). Reducing this will benefit us.

Due to the absolute value, L1 regularization provides with a non-differentiable term, L1 regularization is also robust to outliers.

L2 regularization (Ridge regularization) : -

L2 regularization, also known as Ridge regularization, adds a penalty term to the loss function that is proportional to the square of the weights. This results in all weights being shrunk, but none of them being set to zero. L2 regularization is particularly useful when the number of samples is much larger than the number of features, as it helps to prevent overfitting.

The differences between L1 and L2 regularization:

In general, L1 regularization is more useful when we want to have a sparse model, that is we want many of the coefficients to be zero, and L2 regularization is more useful when we want to have a model where all the coefficients are small, but not necessarily zero.

Elastic net regularization :-

It’s worth noting that, L1 and L2 regularization can be combined by adding the penalty term of L1 regularization and L2 regularization to the loss function, which is called Elastic net regularization.

Which technique is commonly preferred to boost the model’s accuracy rate and why?

One cannot really draw a conclusion as to which technique provides a better accuracy rate as it depends on several factors. Although L2 is majorly used to prevent overfitting, it is not very useful in the case of high-dimensional data, as it will pose computational challenges. It is preferred when many features are highly correlated with the target. For modeling cases where the features are in millions, L1 regularization is the desired technique as it provides sparse solutions. A sparse model is a great property to have when dealing with exorbitantly high features. Eventually, it depends on the real-world problem at hand and our model objective.

Conclusion :

I compared L1 and L2 performance, and L2 scores were overall better than L1, although L1 has the interesting property of generating sparse matrices.

Regularization is a powerful technique that can be used to improve the performance of machine learning models. It is important to understand the difference between L1 and L2 regularization and when to use them.

End Notes:

If you liked this post, share with your interest group, friends and colleagues. Comment down your thoughts, opinions and feedback below. I would love to hear from you. Do follow me for more such articles and motivating me 😀.

It doesn’t cost you anything to clap. 👏

--

--

Shubham Koli

Data Science || Machine Learning || Computer Vision || NLP || Mechanical Engineer