LASSO Regression In Detail (L1 Regularization)

Aarthi Kasirajan
3 min readJun 18, 2020

--

LASSO stands for Least Absolute Shrinkage and Selection Operator. It is a regularization method that creates models in the presence of large models in the presence of large number of features, which implies-

1. Large enough to cause computational challenges.

2. Large enough to enhance the tendency of overfit.

Lasso and Ridge are very similar to one another, but at the same time they have a key difference.

To have a detailed view on Ridge Regression, see here — https://medium.com/@minions.k/ridge-regression-l1-regularization-method-31b6bc03cbf

In the Sum of Squared Residuals formula,

=> ∑ [y — (b +a1*x1 +a2*x2 + a3*x3…)] ^2 + ∑ lambda |ai|,

Or in other words, y =ax + b + lambda (slope)2.

When the lambda value equals zero, the Sum of Squared Residual becomes the Ordinary Least Squared Method. As we increase the value of lambda, the parameters shrink to a large extent and also sometimes reduces to zero.

Ridge VS LASSO Regression Visualized:

In Ridge Regression — —

In a graph of slope values vs, the entire equation and plot the curve for different lambda values. Initially, when lambda is zero, the curve reaches a minimum value, say at 0.45.

When we increase the lambda values further the slope converges earlier and minimum value is also reached earlier than that of previous value. This results in an optimal slope. Even if we increase the lambda value very high, we still get a optimal slope (> 0).

In Lasso Regression — —

Similarly plotting for Lasso penalty term, we take the absolute value of slope. In lasso regression, initially when lambda = 0, we get a curve.

As the value of lambda increase, the slope converges very quickly and there is sharp change in curve while obtaining the minimum value. This feature allows the model to diminish certain unwanted features to zero, thus helping in reducing the complexity of the model.

The below figure shows details explains what happens in both Regularization methods.

In Lasso Regression (L1), slope values are coinciding with x = 0, whereas in Ridge Regression (L2) it only approaches close to zero.

Which one to use and when???

To answer this, we need to know the size of the data that we might want to process and predict. If the size of data is less. i.e. less features, we can go with Ridge Regression.

But, when the data contains many features, say few millions, then in that case the complexity of the model increases and hence using Lasso Regression would reduce the undesired features and reduce the complexity of the model for better performance.

--

--