An Overview of Ridge Regression

Published in

Python’s Gurus

4 min readJun 29, 2024

Ridge regression is a powerful statistical technique that extends Ordinary Least Squares (OLS) regression. It acts as a regularization method, improving the performance and generalizability of linear regression models by introducing a penalty on the size of coefficients. This helps mitigate two common issues in regression models: overfitting and multicollinearity.

Let’s first take a look at Overfitting.

Overfitting

Overfitting occurs when a model memorizes training data too closely, leading to poor performance on unseen data. For example, consider one response variable, one predictor variable, and two observations. Using OLS, the regression line will pass through the two points, resulting in zero squared residuals. However, this model may not perform well on new data, leading to large squared residuals (high variance).

Ridge regression addresses this by adding a penalty term 𝜆 × 𝑠𝑙𝑜𝑝𝑒^2 to the squared residuals. This penalty term reduces the variance when fitting the model to unseen data. Ridge regression produces a line with a smaller slope than the OLS line, making it less sensitive to the predictor variable.

This approach applies to data with more than two variables as well. The penalty term is the sum of the squares of the coefficients, multiplied by a tuning parameter 𝜆.

The objective of ridge regression is to find coefficients that not only fit the data well but also prevent overfitting.

Now let’s consider the case of multicollinearity.

Multi-collinearity

Multicollinearity occurs when independent variables in the model are highly correlated, leading to unstable and unreliable coefficient estimates.

Ridge regression reduces the impact of multicollinearity by shrinking the coefficients towards zero, reducing impact of multicollinearity. When predictor variables are highly correlated, Ridge regression tends to allocate the impact of the predictors more evenly among them, instead of assigning disproportionately large coefficients to any single predictor. With many benefits one can still wonder if this method is good since bias is introduced. This is commonly discussed as ‘Bias Variance Trade-off’.

Bias Variance Trade Off

The bias-variance trade-off is a fundamental concept in machine learning, highlighting the balance between two sources of error that affect a model’s performance: bias and variance.

Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simpler model. High bias can cause a model to miss relevant relations between features and target outputs, leading to underfitting. Underfitting means the model is too simple and cannot capture the underlying pattern of the data.

Variance refers to the error introduced by the model’s sensitivity to small fluctuations in the training set. High variance can cause a model to capture noise in the training data as if it were a true pattern, leading to overfitting. Overfitting means the model is too complex and fits the training data too closely, performing poorly on new, unseen data.

Ridge regression helps manage this trade-off by introducing a penalty term that shrinks the coefficients of the model. This causes Ridge regression to introduce a small amount of bias into the model. However, this bias is typically outweighed by the reduction in variance, especially when multicollinearity is present. The smaller coefficients lead to a smoother and more stable model, which generalizes better to unseen data.

Tuning parameter

When discussing about Ridge regression it is important to focus on the selection of the tuning parameter 𝜆. The tuning parameter controls the strength of the penalty in Ridge regression and larger values of 𝜆 leads to greater shrinkage of coefficients. The optimal value for 𝜆 is determined through cross validation where different values are tested and the one that minimized the prediction error on unseen data is selected.

In conclusion, Ridge regression is a valuable tool to mitigate multicollinearity and overfitting of data by striking a balance between bias and variance, thereby enhancing model generalizability. Overall, Ridge Regression serves as a robust regularization technique, facilitating the development of more reliable and interpretable regression models in practical data analysis scenarios.

For readers interested in application of Ridge in Python, take a look at the code below.

Code:

Below you can find a simple code snippet to fit a Ridge Regression on synthetic data. The python code is as follows:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

#Synthetic data
np.random.seed(42) #feel free to choose any seed
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

#Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#Ridge regression model
ridge_reg = Ridge()
ridge_reg.fit(X_train, y_train)

#Predict on the test set
y_pred = ridge_reg.predict(X_test)

#Mean squared error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

#Plot of data and the Ridge regression line
plt.scatter(X, y, color='blue', label='Data points')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Ridge Regression Line')
plt.xlabel("X")
plt.ylabel("y")
plt.legend()
plt.show()

The output revealed the Test MSE value as approximately 0.6476 and the graph is shown below:

You can explore more by fitting Ridge Regression for more complex/real datasets but that is for another time. Stay tuned!

Python’s Gurus🚀

Thank you for being a part of the Python’s Gurus community!

Before you go:

Be sure to clap x50 time and follow the writer ️👏️️
Follow us: Newsletter
Do you aspire to become a Guru too? Submit your best article or draft to reach our audience.

An Overview of Ridge Regression

Python’s Gurus🚀

Written by Naethreepremnath