Regularization path using Lasso regression.
What is a regularization path and why you should make one!
First of all … What is a Linear Regression?
A Linear Regression is a linear method to represent the relationship between a target and explaining variables. A prediction can be calculated like this.
This is the weighted sum of the variables plus the intercept. Okay, but how do we obtain the θ variables? To find the bests thetas we need a minimization function. Hello MSE (Mean Square Error)!
The MSE is the mean of the square of the errors, it is the most used cost function of the linear regression.
From linear Regression to Lasso Regression
A Lasso regression (Least Absolute Shrinkage and Selection Operator) is a regularized linear regression. A regularisation term is added to the cost function (MSE) of the linear regression which becomes this.
Like for the linear regression θ = (θ₁,…,θₙ) is the weight vector of the feature and the MSE (Mean Square Error) is still the mean of the square of the errors. But what is this last term? It’s the regularization term !
By adding this regularization term, corresponding to the L1 norm of the weights, it will force the less important features to zero. In other words the Lasso Regression is doing a feature selection, the higher the value of alpha the fewer features are selected.
Let’s create our own regularization path !
Now we know what is a Lasso Regression, we can create our regularization path, a regularization path is a plot of all coefficients values against the values of alphas. It’s the best way to see the behaviour of the Lasso Regression.
To begin, we need to import some libraries as well as the Boston house-prices dataset from sklearn library.
Now that we have imported our libraries and data, we can create our regularization path.
Then we standardise our data and we plot our regularization path !
Amazing ! Based on the alpha values, Lasso has selected some features!
As we can see when the alpha value is too high, the Lasso Regression could not fit the weight to the features, that’s why we have such a low score, moreover going for too low alpha is useless, the cost function has already converged to a low limit.
Tradeoff between performance and interpretability
If we take a look at our regularization path, we can see that when alpha is around 0.1 Lasso has selected 4 additional features for slight improvement of the score.
Are those features a real improvement ? Not really, they can be considered as noise, they will drastically decrease the interpretability for a slight increase in the score.
Those features can be the result of overfitting, they cancel each other out to give a slightly better score. We definitely don’t want that.
Limits
Lasso regression can be imprevisible when the number of features is larger than the number of variables, or when features are strongly correlated ! Do not skip feature engineering !
Conclusion
A regularization path is an amazing tool to see the behaviour of our Lasso regression, it gives us an idea of the feature importance and of the score we can expect ! But everything comes at a cost, fitting a lot of regression can be computationally expensive.
Bonus
You can do exactly the same for a Ridge or Elastic net regression ;)
If you don’t want to make your own amazing and interactive regularization path as I did ;) You can use the sklearn.linear_model.lasso_path function from sklearn library.
Thanks for reading !
If you find it interesting feel free to give me a like or reach me on LinkedIn !