Regularization and Gradient Descent Cheat Sheet
Model Complexity vs Error:
Preventing Under — and Overfitting:
How to use a degree N polynomial and prevent overfitting?
Regularization:
Regularization applies to objective functions in ill-posed optimization problems. The regularization term, or penalty, imposes a cost on the optimization function for overfitting the function or to find an optimal solution.
Below are the Methods of Regularization:
Ridge Regression (L2):
· Penalty shrinks magnitude of all coefficients.
· Larger coefficients strongly penalized because of the squaring.
Effect of Ridge Regression on Parameters:
Lasso Regression (L1):
· Penalty selectively shrinks some coefficients.
· Can be used for feature selection.
· Slower to converge than Ridge regression.
Effect of Lasso Regression on Parameters:
Elastic Net Regularization:
· Compromise of both Ridge and Lasso regression
· Requires tuning of additional parameter that distributes regularization penalty between L1 and L2.
Hyperparameters and Their Optimization:
Ridge Regression — The Syntax:
#Import the class containing the regression method.
from sklearn.linear_model import Ridge
#Create an instance of the class.
RR= Ridge(alpha=1.0) # Regularization parameter
#Fit the instance on the data and then predict the expected value.
RR= RR.fit(X_train, y_train)
y_predict= RR.predict(X_test)
The RidgeCV class will perform cross validation on a set of values for alpha.
lasso Regression — The Syntax:
#Import the class containing the regression method.
from sklearn.linear_model import Lasso
#Create an instance of the class.
LR= Lasso(alpha=1.0) # Regularization parameter
#Fit the instance on the data and then predict the expected value.
LR= LR.fit(X_train, y_train)
y_predict= LR.predict(X_test)
The LassoCV class will perform cross validation on a set of values for alpha.
Elastic Net Regression — The Syntax:
#Import the class containing the regression method.
from sklearn.linear_model import ElasticNet
#Create an instance of the class.
EN= ElasticNet(alpha=1.0, l1_ratio=0.5)
# alpha is the regularization parameter, l1_ratio distributes alpha to L1/L2
#Fit the instance on the data and then predict the expected value.
EN= EN.fit(X_train, y_train)
y_predict= EN.predict(X_test)
The ElasticNetCV class will perform cross validation on a set of values for l1_ratio and alpha.
Feature Selection:
· Regularization performs feature selection by shrinking the contribution of features.
· For L1-regularization, this is accomplished by driving some coefficients to zero.
· Feature selection can also be performed by removing features.
Why is Feature Selection Important?
· Reducing the number of features is another way to prevent overfitting (similar to regularization)
· For some models, fewer features can improve fitting time and/or results.
· Identifying most critical features can improve model interpretability.
Recursive Feature Elimination — The Syntax:
#Import the class containing the feature selection method.
from sklearn.feature_selectionimport RFE
#Create an instance of the class.
rfeMod= RFE(est, n_features_to_select=5)
#est is an instance of the model to use, n_features_to_select is a final number of features.
#Fit the instance on the data and then predict the expected value.
rfeMod=rfeMod.fit(X_train, y_train)
y_predict= rfeMod.predict(X_test)
The RFECV class will perform feature elimination using cross validation.
Gradient Descent:
Start with a cost function J(𝛽):
Gradient Descent with Linear Regression:
Stochastic Gradient Descent:
Mini Batch Gradient Descent:
Stochastic Gradient Decent Regression — Syntax:
#Import the class containing the regression model.
from sklearn.linear_model import SGDRegressor
#Create an instance of the class.
SGDreg= SGDRregressor(loss=’squared_loss’,alpha=0.1, penalty=’l2')
# squared_loss = linear regression, regularization parameters
#Fit the instance on the data and then transform the data.
SGDreg=SGDreg.fit(X_train, y_train)
For Mini-batch version — SGDreg=SGDreg.partial_fit(X_train, y_train)
y_pred= SGDreg.predict(X_test)
Other loss methods exist epsilon_insensitive, huber, etc.
Stochastic Gradient Descent Classification — Syntax:
#Import the class containing the classification model.
from sklearn.linear_model import SGDClassifier
SGDclass= SGDClassifier(loss=’log’, alpha=0.1, penalty=’l2')
# log loss = logistic regression, regularization parameters
#Fit the instance on the data and then transform the data.
SGDclass=SGDclass.fit(X_train, y_train)
For mini-batch version SGDclass=SGDclass.partial_fit(X_train, y_train)
y_pred= SGDclass.predict(X_test)
Other loss methods exist hinge, squared_hinge, etc.
Reference:
https://www.javatpoint.com/machine-learning-polynomial-regression