Regularization Technique

Vignesh CS
The Startup
Published in
3 min readAug 11, 2020

What is Regularization Technique?
It’s a technique mainly used to overcome the over-fitting issue during the model fitting. This is done by adding a penalty as the model’s complexity gets increased. Regularization parameter λ penalizes all the regression parameters except the intercept so that the model generalizes the data and it will avoid the over-fitting (i.e. it helps to keep the parameters regular or normal). This will make the fit more generalized to unseen data.

Over-fitting means while training the model using the training data, the model reads all the observation and learns from it and model becomes too complex. But while validating the same model using the testing data, the fit becomes worse.

What does the Regularization Technique do?
The basic concept is we don’t want huge weight for the regression coefficients. The simple regression equation is y= β0+β1x , where y is the response variable or dependent variable or target variable, x is the feature variable or independent variable and β’s are the regression coefficient parameter or unknown parameter.
A small change in the weight to the parameters makes a larger difference in the target variable, thus it ensures that not too much weight is added. In this, not too much weight to any feature is given, and zero weight is given to the least significant feature.

Working of Regularization
Thus regularization will add the penalty for the higher terms and this will decrease the importance given to the higher terms and will bring the model towards less complex.
Regularization equation:

Min(Σ(yi-βi*xi)² + λ/2 * Σ (|βi|)^p )

where p=1,2,…. and i=1,…,n. Mostly the popular values of p chosen would be 1 or 2. Thus selecting the feature is done by regularization.

What is Loss function?
Loss function is a function mainly used to estimate how far the estimated value from the observed actual value. i.e Σ(Y - f(x)).
This is of two types:

  1. L1 loss function- Which gives the absolute sum of the difference of actual value minus estimated value. Given by: Σ(|Yi - f(x)|), thus there is a possibility of multiple solutions.
  2. L2 loss function- Which gives the squared sum of the difference of actual value minus estimated value. Given by: Σ(Yi - f(x))², thus it gives us the least square value and will give us one clear form of solution.

What are the type of Regularization Technique?
There are two type of regularization technique, they are:

  1. Lasso Regularization / L1 Regularization- This will add “Absolute value of magnitude” of coefficient as penalty term to the loss function.

arg Min(Σ(yi-βi*xi)² + λ* Σ (|βi|))

2. Ridge Regularization / L2 Regularization- This will add “Squared magnitude” of coefficient as penalty term to the loss function.

arg Min(Σ(yi-βi*xi)² + λ* Σ (βi)²)

If λ is zero, then OLS (Ordinary Least Square) method is used. Else if λ is Very large, then it will add too much weight and it will lead to under-fitting. Thus choosing the value for λ is very important.

LASSO
will shrink the less important feature’s coefficient by assigning the value zero to it, and this will remove automatically the least significant variables. This will help us in variable selection. Mainly, LASSO will work well with less number of significant variable parameters.

Ridge
This will add penalty and as a result, shrinks the size of weight. Additionally, this will work well even with a huge number of variable parameters. Also used in co-linearity problems.

Finding optimal suitable weights is a big challenge.

Prediction accuracy is good.

Other model selection criteria like AIC, BIC, Cross-Validation, Step-wise regression to handle over-fitting, and perform feature selection work well with a small set of features but these regularization techniques are a great alternative when we are dealing with a large set of features.

--

--