L2 Regularization

Jahnavimuttireddy
4 min readMay 2, 2023

--

L2 regularization, also known as Ridge regularization, is a technique used in machine learning to prevent the overfitting of a model. Overfitting occurs when a model is too complex and performs well on the training data, but poorly on new, unseen data. L2 regularization adds a penalty term to the model’s cost function, encouraging the model to have smaller weights.

The penalty term is the sum of the squares of all the weights in the model, multiplied by a regularization parameter lambda. The lambda parameter is a hyperparameter that controls the strength of the regularization.

The effect of L2 regularization is to shrink the model’s weights toward zero, without actually setting them to zero. This can help to prevent overfitting, as smaller weights reduce the model’s complexity and its ability to fit noise in the data.

The cost function for a linear regression model with L2 regularization is:

J(w) = MSE(w) + λ ||w||²

where J(w) is the cost function, MSE(w) is the mean squared error, λ is the regularization parameter, w is the vector of model weights, and ||w||² is the L2 norm of the weight vector.

Applications of L2 Regularizations:

L2 regularization is a commonly used technique in machine learning for a variety of applications, including:

Regression: L2 regularization is often used in linear regression models to prevent overfitting and improve generalization performance.

Classification: L2 regularization can be used in logistic regression models to improve the model’s ability to generalize to new data.

Natural Language Processing (NLP): L2 regularization can be used in NLP tasks, such as text classification or sentiment analysis, to improve the accuracy of the model.

Computer Vision: L2 regularization can be used in image classification or object detection tasks to prevent overfitting and improve the model’s ability to generalize to new images.

Recommender Systems: L2 regularization can be used in collaborative filtering-based recommender systems to improve the accuracy of the recommendations.

Overall, L2 regularization is a widely used and effective technique for improving the performance of machine learning models and preventing overfitting.

L2 regularization in action:

Linear Regression: Suppose we have a dataset of housing prices, and we want to predict the price of a house based on its size (in square feet) and the number of bedrooms. We can build a linear regression model with L2 regularization to prevent overfitting and improve generalization performance. The cost function for the model can be expressed as:

J(w) = MSE(w) + λ ||w||²

where MSE(w) is the mean squared error between the predicted and true values of y, w is the vector of model weights, and ||w||² is the L2 norm of the weight vector. The regularization parameter λ controls the strength of the regularization.

Neural Networks: L2 regularization can also be applied to neural networks to prevent overfitting. In this case, the regularization term is added to the network’s loss function during training. For example, in a multi-layer perceptron (MLP) with L2 regularization, the loss function can be expressed as:

loss = cross_entropy_loss(y_true, y_pred) + λ * ||w||²

where cross_entropy_loss is the loss function used for classification tasks, y_true is the true labels, y_pred is the predicted labels, w is the vector of weights in the network, and ||w||² is the L2 norm of the weight vector. The regularization parameter λ controls the strength of the regularization.

Overall, L2 regularization is a versatile and widely used technique in machine learning and can be applied to a variety of models and tasks to improve generalization performance and prevent overfitting.

Example of L2 regularization in linear regression:

Suppose we have a linear regression model with two input features, x1, and x2, and we want to predict a continuous target variable, y. The model can be expressed as:

y = w0 + w1x1 + w2x2

where w0, w1, and w2 are the coefficients or weights of the model. To prevent overfitting, we can add an L2 regularization term to the model’s cost function:

cost = MSE(y_true, y_pred) + λ * (w1² + w2²)

where MSE is the mean squared error between the true and predicted values of y, λ is the regularization parameter, and w1² and w2² are the squared values of the weights.

The regularization parameter λ controls the strength of the regularization. A higher value of λ will result in smaller weights, which in turn results in a simpler model with less overfitting. On the other hand, a lower value of λ will result in larger weights, which can lead to more complex models with higher overfitting.

L2 regularization encourages the model to distribute the weight values more evenly among the features, effectively shrinking the weight values towards zero. This reduces the impact of any single feature on the model’s prediction, resulting in a more robust and generalized model.

--

--