Regularization and Gradient Descent Cheat Sheet

Model Complexity vs Error:

Subrata Mukherjee

Published in

The Startup

5 min readJan 18, 2021

--

Preventing Under — and Overfitting:

How to use a degree N polynomial and prevent overfitting?

Regularization:

Regularization applies to objective functions in ill-posed optimization problems. The regularization term, or penalty, imposes a cost on the optimization function for overfitting the function or to find an optimal solution.

Below are the Methods of Regularization:

Ridge Regression (L2):

· Penalty shrinks magnitude of all coefficients.

· Larger coefficients strongly penalized because of the squaring.

Effect of Ridge Regression on Parameters:

Lasso Regression (L1):

· Penalty selectively shrinks some coefficients.

· Can be used for feature selection.

· Slower to converge than Ridge regression.

Effect of Lasso Regression on Parameters:

Elastic Net Regularization:

· Compromise of both Ridge and Lasso regression

· Requires tuning of additional parameter that distributes regularization penalty between L1 and L2.

Hyperparameters and Their Optimization:

Ridge Regression — The Syntax:

#Import the class containing the regression method.

from sklearn.linear_model import Ridge

#Create an instance of the class.

RR= Ridge(alpha=1.0) # Regularization parameter

#Fit the instance on the data and then predict the expected value.

RR= RR.fit(X_train, y_train)

y_predict= RR.predict(X_test)

The RidgeCV class will perform cross validation on a set of values for alpha.

lasso Regression — The Syntax:

#Import the class containing the regression method.

from sklearn.linear_model import Lasso

#Create an instance of the class.

LR= Lasso(alpha=1.0) # Regularization parameter

#Fit the instance on the data and then predict the expected value.

LR= LR.fit(X_train, y_train)

y_predict= LR.predict(X_test)

The LassoCV class will perform cross validation on a set of values for alpha.

Elastic Net Regression — The Syntax:

#Import the class containing the regression method.

from sklearn.linear_model import ElasticNet

#Create an instance of the class.

EN= ElasticNet(alpha=1.0, l1_ratio=0.5)

# alpha is the regularization parameter, l1_ratio distributes alpha to L1/L2

#Fit the instance on the data and then predict the expected value.

EN= EN.fit(X_train, y_train)

y_predict= EN.predict(X_test)

The ElasticNetCV class will perform cross validation on a set of values for l1_ratio and alpha.

Feature Selection:

· Regularization performs feature selection by shrinking the contribution of features.

· For L1-regularization, this is accomplished by driving some coefficients to zero.

· Feature selection can also be performed by removing features.

Why is Feature Selection Important?

· Reducing the number of features is another way to prevent overfitting (similar to regularization)

· For some models, fewer features can improve fitting time and/or results.

· Identifying most critical features can improve model interpretability.

Recursive Feature Elimination — The Syntax:

#Import the class containing the feature selection method.

from sklearn.feature_selectionimport RFE

#Create an instance of the class.

rfeMod= RFE(est, n_features_to_select=5)

#est is an instance of the model to use, n_features_to_select is a final number of features.

#Fit the instance on the data and then predict the expected value.

rfeMod=rfeMod.fit(X_train, y_train)

y_predict= rfeMod.predict(X_test)

The RFECV class will perform feature elimination using cross validation.

Gradient Descent:

Start with a cost function J(𝛽):

Gradient Descent with Linear Regression:

Stochastic Gradient Descent:

Mini Batch Gradient Descent:

Stochastic Gradient Decent Regression — Syntax:

#Import the class containing the regression model.

from sklearn.linear_model import SGDRegressor

#Create an instance of the class.

SGDreg= SGDRregressor(loss=’squared_loss’,alpha=0.1, penalty=’l2')

# squared_loss = linear regression, regularization parameters

#Fit the instance on the data and then transform the data.

SGDreg=SGDreg.fit(X_train, y_train)

For Mini-batch version — SGDreg=SGDreg.partial_fit(X_train, y_train)

y_pred= SGDreg.predict(X_test)

Other loss methods exist epsilon_insensitive, huber, etc.

Stochastic Gradient Descent Classification — Syntax:

#Import the class containing the classification model.

from sklearn.linear_model import SGDClassifier

SGDclass= SGDClassifier(loss=’log’, alpha=0.1, penalty=’l2')

# log loss = logistic regression, regularization parameters

#Fit the instance on the data and then transform the data.

SGDclass=SGDclass.fit(X_train, y_train)

For mini-batch version SGDclass=SGDclass.partial_fit(X_train, y_train)

y_pred= SGDclass.predict(X_test)

Other loss methods exist hinge, squared_hinge, etc.

Reference:

https://www.javatpoint.com/machine-learning-polynomial-regression

https://en.wikipedia.org/wiki/Regularization_(mathematics)#:~:text=In%20mathematics%2C%20statistics%2C%20finance%2C,in%20ill%2Dposed%20optimization%20problems

Machine Learning

Artificial Intelligence

Gradient Descent

Subrata Mukherjee

Written by Subrata Mukherjee

Writer for

The Startup

Technical specialist for AI & ML . Cloud (Oracle Cloud Infrastructure/Azure/AWS) . AEM . DevOps

Help
Status
About
Careers
Blog
Privacy
Terms
Text to speech
Teams