Understanding Polynomial Regression in Machine Learning

3 min readMay 19, 2024

Polynomial Regression is a powerful technique in the field of machine learning that allows us to capture the nonlinear relationships between the independent and dependent variables. It extends the capabilities of simple linear regression by introducing polynomial terms to the regression equation. In this blog post, we will delve into the details of Polynomial Regression, covering its concepts, mathematical formulation, implementation in Python, and practical considerations.

What is Polynomial Regression?

Polynomial Regression is a form of regression analysis in which the relationship between the independent variable X and the dependent variable y is modeled as an n-th degree polynomial in X. The equation for polynomial regression is:

y=β0+β1X+β2X2+β3X3+…+βnXn+ϵ

where:

y is the dependent variable (target),
X is the independent variable (feature),
β0,β1,β2,…,βn are the coefficients of the polynomial terms,
ϵ is the error term.

Key Concepts of Polynomial Regression:

Non-linear Relationship: Polynomial Regression can capture non-linear relationships between X and y, which cannot be captured by simple linear regression.
Degree of the Polynomial: The degree n of the polynomial determines the flexibility of the model. Higher degrees allow the model to fit more complex relationships but may lead to overfitting.
Bias-Variance Tradeoff: Choosing an appropriate degree of the polynomial is crucial to balance bias and variance. Underfitting (high bias) occurs with a low-degree polynomial, while overfitting (high variance) occurs with a high-degree polynomial.

Mathematical Formulation:

The Polynomial Regression model can be represented in matrix form as:

y=Xβ+ϵ

where:

y is an m×1 vector of the target variable,
X is an m×(n+1) matrix of the input features, where each column represents a different polynomial degree,
β is a (n+1)×1 vector of the model coefficients,
ϵ is an m×1 vector of errors.

Implementation in Python:

Let’s implement Polynomial Regression in Python using the PolynomialFeatures and LinearRegression classes from Scikit-Learn. We will use the Boston housing dataset for this,

# Importing necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Load the Boston housing dataset
boston = load_boston()
X = boston.data[:, 12].reshape(-1, 1)  # Using only 'LSTAT' feature
y = boston.target.reshape(-1, 1)

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Polynomial transformation
poly_features = PolynomialFeatures(degree=3)
X_train_poly = poly_features.fit_transform(X_train)
X_test_poly = poly_features.transform(X_test)

# Training the Polynomial Regression model
model = LinearRegression()
model.fit(X_train_poly, y_train)

# Predicting on test set
y_pred = model.predict(X_test_poly)

# Calculating RMSE
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"Root Mean Squared Error: {rmse}")

# Plotting the Polynomial Regression results
plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.scatter(X_test, y_pred, color='red', label='Predicted')
plt.title('Polynomial Regression')
plt.xlabel('LSTAT')
plt.ylabel('MEDV')
plt.legend()
plt.show()

Explanation of the Code:

Import Libraries: Import the necessary libraries including NumPy, Matplotlib, and Scikit-Learn.
Load Dataset: Load the Boston housing dataset using Scikit-Learn’s load_boston() function.
Data Preparation: Use only the ‘LSTAT’ feature and split the data into training and testing sets.
Polynomial Transformation: Use PolynomialFeatures to transform the input features to polynomial terms.
Model Training: Train the Linear Regression model on the polynomial features.
Prediction and Evaluation: Predict the target variable on the test set and evaluate the model performance using RMSE.
Plotting Results: Visualize the actual vs. predicted values of the target variable.

Conclusion:

In conclusion, Polynomial Regression is a flexible and powerful technique that allows us to capture complex relationships in data that cannot be modeled with simple linear models. However, it’s important to choose the degree of the polynomial carefully to avoid overfitting or underfitting the data. In this blog post, we covered the concepts, mathematical formulation, implementation in Python, and provided a practical example using the Boston housing dataset.

Polynomial Regression is just one of the many regression techniques used in machine learning. Stay tuned for more blog posts where we will cover other regression algorithms in detail.