Ridge Regression(L2 Regularization Method)
Regularization is a technique that helps overcoming over-fitting problem in machine learning models. It is called Regularization as it helps keeping the parameters regular or normal. The common techniques are L1 and L2 Regularization commonly known as Lasso and Ridge Regression.
Error function is calculated based on Training data set. When the model fits too closely to the training data, it will be termed Overfitting. In this case, the model performs very well on training data but very poorly in Testing data. To neutralize or optimize this error, Regularization is done and thus helps in keeping the parameter regular or normal.
Why is this Regularization necessary??
Any new large value (outlier value) is considered as a bad sign. The new data point will definitely alter the coefficients of the model slightly and thus brings in a huge impact to the model. This regularization techniques very less or no importance to features that are not very important. To overcome the problem of Overfitting and Underfitting we add a new term, also known as the Penalty Term.
= ∑ [y — (b +a1*x1 +a2*x2 + a3*x3…)] ^2 + ∑ lambda |ai|^p,
where p = 1,2 depending on which Regression.
P =1 for Lasso, helps in feature selection by reducing certain columns when not important. It may have multiple solutions
P=2 for Ridge, it presents only one solution and converges the problem to a certain extent.
A regularization term is added to loss function to overcome the overfitting line and improve prediction accuracy.
In this, we will look in detail only Ridge Regression.
How to find when is Regularization needed??
The machine initially finds the best fit line using “Training data”, but when the same coefficients are used to predict the testing data, the line is not able to predict with good accuracy as the best fit line changes with new data values.
The sum of squared error will be different for training and testing data. So, we apply few correction methods or regularization techniques to compensate the loss function. This parameter is known as Penalizing function.
We use Ridge Regression to find a new line that doesn’t fit the training data well. In other words, we introduce a small bias into how the new line is fit to data and in return we obtain a significant drop in variance. By starting with a slightly worse fit, Ridge regression can provide better long-term predictions.
Why Ordinary Least Square is not sufficient??
While predicting with “Ordinary Least Squares”, y = ax + b, thus minimizing sum of squared residual. Without small amount of bias, Ordinary Least Squares method has large amount of variance.
While predicting using Ridge Regression, y =ax + b + lambda (slope)2.
This extra term is known as Penalty and lambda determines how severe the penalty will be. Thus, we would choose Ridge Regression over Ordinary Least Squares method. But, with a small amount of bias due to penalty term, we can achieve to reduce a little amount of variance.
Let’s see this through an example…
The slope of a line determines the change in “y” values or prediction for every unit change in values of” x”. If the slope is steep, for every unit change in x, y changes more than 1 unit. In such case, the prediction for “y”
If the slope is small, for every unit change in x, y hardly changes. In such cases, predictions for y are less sensitive to change in x.
When lambda = 0, ridge regression is the same as Sum of Least Square method. As the lambda value increases, the equation becomes less sensitive to x values. To find the optimum values of lambda that results in lowest variance, we use “10-fold Cross Validation Method”.
Ridge Regression also works when we have Discrete variables like high fat, low fat, etc.
Thus, in general, Ridge Regression helps reduce variance by shrinking the parameters and making our predictions less sensitive to them. The penalty term contains all the parameters except for the “y prediction”.
To understand in detail about Lasso Regression, click here — https://medium.com/@minions.k/lasso-regression-in-detail-l1-regularization-593044a85248