L1 Regularization | one minute summary
Have you wrangled with concept of LASSO Regression?
Published in
2 min readJul 30, 2021
L1 Regularization (also called LASSO regression) is used less often than L2 Regularization, but has some key advantages in some situations; the 80–20 Rule (a.k.a. the Pareto Principle), “80% of the consequences come from 20% of the causes”, comes to mind.
Prerequisite Info: Regularization, L2 Regularization
- Why? For any given model, some weights will be more important than others. However, random noise during training will cause some of the less important weights to have influence. Therefore, one way to prevent a model from overfitting to noise (and also make it easier to see which features are actually important) is to get rid of weights that are less predominant.
- What? L1 Regularization is a technique to reduce model complexity by zeroing out some less important weights (i.e. it encourages sparsity), thereby making a model more visually interpretable.
- How? L1 Regularization adds the absolute value of a weight as the penalty term to the loss function (multiplied by a lambda hyperparameter). This means that during gradient descent, all weights are repeatedly penalized, such that only the weights that are important (i.e. are repeatedly increased in the positive direction because training examples use them, not just because of random noise) will survive, and the rest will go to 0.