Hyperparameter Tuning in Linear Regression.

Lakshmi Sruthi
Analytics Vidhya
Published in
3 min readApr 22, 2021

Before that let us understand why do we tune the model.

What is the purpose of tuning?

We tune the model to maximize model performances without overfitting and reduce the variance error in our model. We have to apply the appropriate Hyperparameter technique for our model.

Types of errors

1. What is a Variance error?

It refers to the amount that the predicted value would change if different training data were used.

2. What is a Bias error?

It is the error due to the model assumptions that are made to simplify the model.

Regularization

It reduces the overfitting nature of the model. Even if the model works well, this is done in order to prevent the problem from occurring in the future. This is done by introducing more errors and making the model learn more. This will help the model to learn more. And as a result, even if more data is added in the later stage, the model will be able to process those without any issues. Now the model performance will increase and will be better than the unregularized model.

Coefficient shrinks whenever we do regularization. We need to make sure that our model doesn’t get under-fitted by tuning too much in alpha as well. Alpha is a penalty factor. Error is introduced in the system by drawing a line that’s doesn’t touch the majority of the points. These regularization models will shrink the coefficient as they build models that reduce the slope which will not change much for new data. Shrinkage in coeff totally depends on the variables. If the feature is significant then the shrinkage will be less but if the feature is not significant then shrinkage will more. if the feature is highly insignificant then the coeff will become 0. The advantage of this regularizes models is that even if the assumptions are not checked, the model will do all the work.

What is overfitting?

When your model learns all complex and noise from training data and performs well in training data but while coming to validation data it does not work well then our data is overfitting.

What is underfitting?

When our data is underfitting then our model does learn the underlying trend data. It occurs when we have fewer data to build the model or when we try to build the linear model with non-linear data.

What is Cross-Validation?

Cross-Validation is essentially a technique used to assess how well a model performs on a new independent dataset.

The simplest example of cross-validation is when you split your data into three groups: training data, validation data, and testing data, where you see the training data to build the model, the validation data to tune the hyperparameters, and the testing data to evaluate your final model.

Types of regularization

1. Ridge regularization

It adds the “Squared magnitude” of coefficient as a penalty term to the loss function. It is called an L2 penalty

sse = np.sum ((y-b1x1-b2x2-…-bo) **2) + (alpha * (b1**2+b2**2+b3**2+…+bo**2))

2. Lasso regularization

The (least absolute shrinkage and selection operator) adds the “Absolute value of magnitude” of coefficient as a penalty term to the loss function. It is called an L1 penalty.

sse = np.sum ((y-b1x1-b2x2-…-bo) **2) + (alpha * (|b1|+|b2|+|b3|+…+|bo|))

3. ElascticNet Regression

It is the combination of both Ridge and Lasso regularization.

sse = np.sum ((y-b1x1-b2x2-…-bo) **2) + (alpha_ridge * (b1**2+b2**2+b3**2+…+bo**2)) +(alpha_lasso * (|b1|+|b2|+|b3|+…+|bo|))

Gradient Descent

We can find the best alpha value and best regularization using this method. Gradient descent is an iterative optimization algorithm used in machine learning to minimize a loss function. The loss function describes how well the model will perform given the current set of parameters (weights and biases), and gradient descent is used to find the best set of parameters.

Thanks for reading :)

References

https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/

--

--