ML from Scratch- Polynomial Regression Using PyTorch
A Complete Guide on How to Code a Polynomial Regression Model & the Math Behind It
When it comes to predictive analysis, regression models prove to be one of the most cost-efficient methods. While a linear regression model can provide some good predictions, in several cases the polynomial regression models can vastly outperform simple linear models. In the following project, we’ll have a look at how to create a polynomial regression model in PyTorch from scratch.
Project Objectives
- Implementation of a machine learning model in PyTorch that uses a polynomial regression algorithm to make predictions.
We will create the model entirely from scratch, using basic PyTorch tensor operations.
- Using the model to conduct predictive analysis of automobile prices. At the end of the project, we aim at developing a highly efficient ML model that can predict the price of a car on the basis of its features.
- Performing visual & descriptive analysis of the data to predict which features play a key role in determining the price of a car.
Importing the Project Dependencies
Before we begin working on the project, let us import all the necessary libraries and functions.
Now, let us import the dataset.
Here’s the link to the data set if you plan to work on it yourself.
I’ll also link the GitHub repo with the project notebook and the data set at the end of this article.
Since a polynomial regression model can already be a bit too overwhelming for beginners, therefore, for the sake of simplicity, we will be working only with continuous variables in our project.
Data Wrangling
Now, let us analyze the data and see if it needs any cleaning or modifications. First, let us check the data type of each column of the data frame.
*One thing to be noted is that the dataset has no null values.
As we can see, some of the data is of the type ‘int64’, while some of the type ‘float64’. Since we’ll be working with tensors, we might want a uniform data-type throughout our feature set. As a result, we will cast the entire data frame to ‘float64’ at a later step.
Now, we will have a look at the descriptive analysis of the data frame.
As we can see, the data in different columns vary significantly in scale. We have elements in our data set of the order of 1e+0 on the lower range and elements of the order of 1e+3 on the upper range. We might want to normalize the data. We will deal with this at a later step in our project.
Feature Selection
Let us first plot the graph of each independent variable against the target variable (quality).
Now, let us have a look at the correlation values for our dataset.
From the above visual analysis, we can clearly see that enginesize (++0.87), curbweight (+0.84) and horsepower (+0.81) show a very strong correlation with the dependent variable, price.
carwidth (0.76), citympg (-0.69), highwaympg (-0.70) and carlength (0.68) show a relatively stronger correlation. So, for this example, we will be selecting these 7 as our training features.
Now, we will work on our model. But before we do that, first let us observe the plot of the features against the target once again.
If we observe the graphs carefully, we will notice that the features enginesize, curbweight, horsepower, carlength and carwidth have a graph of the type y = x2.
On the other hand, citympg and highwaympg have a graph of the type y = 1 / x2.
So we will deal with these features accordingly.
Formulating the Model from Scratch
Given below are the formulae that our polynomial regression model will work on.
As we can see, a polynomial regression function can be substituted as a linear regression function
Here,
- ŷ- Predicted values of the target variable
- W- It’s a (1 x m) matrix containing all the weights associated with each polynomially transformed dependant variable
- X- A transpose of the matrix containing all the polynomially-transformed feature variables
- B- A (1 x m) matrix, each element of which is the bias value of our function.
- m- Number of rows in the training/test set
- n- Number of polynomially-transformed features
Let us now convert the feature (Pandas Dataframe object) and target (Pandas Series object) into tensor objects so we can use them to train our model with PyTorch.
Now, for features with activation function of the type y = x2, we first create the x2 features then concatenate it to our tensor object.
Now, for features with activation function of the type y = 1/x2, we first create the 1/x2 features then concatenate it to our tensor object.
With this, we have polynomially transformed our features.
Now finally, we will concatenate all the transformed feature tensors into a single tensor object.
Feature Scaling
Since we will be using Stochastic Gradient Descent (SGD) algorithm as the optimization algorithm for our model, scaling can significantly improve the optimization process and the performance of our model.
In this project, we will be performing Min-Max Scaling. Min-max normalization converts the data such that all the values are between [0,1].
The mathematical formula for min-max normalization is as follows:
Here,
- X = observed value
- X min = minimum value observed in the column
- X max = maximum value observed in the column
- X scaled = scaled value
Now, let us perform min-max normalization on our feature set.
Now, we will create a tensor object that holds the weights associated with the polynomially-transformed, normalized features. These weights are randomly generated and are very small.
Now, let us initialize a random bias for our model.
Finally, convert the target array into a tensor.
Now, let us define the regression function.
Now, we will define our mean-squared error (MSE) function. MSE is one of the most commonly used methods to calculate the total model loss. Given below is the formula for the calculation of MSE.
Upon running the MSE function for our model-
As we can see, our model currently gives a very large MSE. Therefore, we need to optimize our model weights and biases to improve its performance.
Optimizing the Model
For optimizing our model, we will be using the Stochastic Gradient Descent (SGD) algorithm. Given below is the mathematics associated with the SGD optimizer algorithm.
For more information regarding SGD, kindly visit this link.
In our model, we will be using PyTorch’s built-in gradient calculation functions for calculating and upgrading the weights and biases.
With all this done, we will finally run the optimizer function for our model.
Now, to check the accuracy of our model, we will calculate its r-squared score. The following is the formula for r2 score-
Now, let us check the performance of our model. We will use the updated models and weights obtained by our optimizer function.
As we can see, our polynomial regression model can explain around 83–84% of the variance for the dependent variable. This is a considerably efficient model.
One thing to be noted here is that we performed training and testing on the same dataset, which is a bad practice. Instead, you should split the dataset into test set and training set. However, in this example, while creating the model from scratch, I skipped the step to keep things simple and understandable.
In the end of our project, let us plot the regression plot for all polynomially transformed features of our model.
From the above graphs, we can see our model is performing pretty well.
With this, we come to the end of our project. For more fun projects like this one, check out my profile.
I am just a novice in the field of Machine Learning and Data Science so any suggestions and criticism will really help me improve.
Hit that follow and stay tuned for more ML stuff!
Link to GitHub repo for the dataset and Jupyter notebook-