Polynomial Regression with Python

Muktha Sai Ajay
DataSeries
Published in
5 min readOct 5, 2020

A comprehensive guide on how to perform polynomial regression

Photo by fabio on Unsplash

Artificial Intelligence (AI) and machine learning technology have been developing rapidly in recent years. We see both of them in our life daily. If you look behind the scenes, you can observe a lot of applications in existence ranging from medical applications to customer recommendations. The fields of AI are making a major breakthrough that no one has ever imagined.

If you are a newbie and need to brush up about regression, have a look at my articles on Linear Regression and Multiple Regression which helps you understand it.

Linear Regression

Linear Regression is one of the most used techniques for fitting a straight line to a linear data. If we try to fit a straight line to nonlinear data, the model would undergo underfit and the performance would be poor and we end up getting less prediction rate and high error rate. An example can be shown below for such a case

Linear Regression curve

In the above graph, we can observe that the linear regression model doesn’t fit well for the given data points. Some points lie above the line while others lie below the line. This gives rise to a significant error in our model. A simple straight line might not be the best case for the above dataset.

Sometimes we can have a more complex distribution of data. It’s hard to fit a straight line to these data. The relationship between the dependent and independent variables is complex.

What is Polynomial Regression?

In Simple Linear Regression, we use a straight line to fit our dataset. But in some datasets, when the dataset is non-linear it’s hard to fit a line. Therefore we use Polynomial Regression. It is represented by the equation

Polynomial Regression equation

It is a form of regression in which the relationship between an independent and dependent variable is modeled as an nth degree polynomial. An example of Polynomial Regression can be shown below

PolynomialRegression Curve

Advantages

  • It provides a better relationship between independent and dependent variables.
  • It fits under a wider range of quadratic equations.

Disadvantages

  • The presence of outliers will affect the results.

Let’s Code

Photo by Joshua Reddekopp on Unsplash

Here we use the scikit-learn library to import the linear regression model and use it directly. There are many datasets available online for linear regression. You can find the dataset and code in the below link

Numpy → It is a library used to perform mathematical operations on arrays.

Pandas → To load the data file as a Pandas data frame and analyze the data.

Matplotlib → I’ve imported pyplot to plot graphs of the data.

Import data

Our file is in the CSV(Comma Separated Values) format, so we import the file using pandas. Then we split the data into Dependent and Independent variables. X is considered as Independent variables and Y is considered as Dependent variables.

Here, we don’t need to split the data into train and test set as it only contains ten columns and the second reason is that we are going to make an accurate prediction which is based on a real-world scenario and we need to train the model with maximum info that is available.

The goal of our model is to predict the salary of an employee based on his/her position, the independent variable X contains the position level of an employee and the dependent variable Y contains the Salary.

Let’s fit the data

From Sklearn, sub-library Linear Regression we import Linear Regression and we fit the model on the training data. The main reason behind creating a Linear Regression model is to compare it with the Polynomial Regression model and determine which model performs well.

Polynomial Fit

From Sklearn, sub-library preprocessing, we import PolynomialFeatures. We create an object for it and mention the required degree of the polynomial. We will create another Linear Regression object through which we will fit our X_poly and Y.

Let’s plot our model!!

Here we will create two scatter plots for comparing how the Linear Regression model and Polynomial Regression models performed.

Linear Regression plot

The Linear Regression model performed bad and the accuracy is poor, let’s plot the polynomial regression model on the same data.

Polynomial Regression plot

It seems like our model performed well, Here is a summary of what I did: I have loaded in the data, split the data into dependent and independent variables, fitted a Linear Regression model and Polynomial Regression model on the data, and visualized how the model performed on two different models.

Thank you for reading my article. I will be happy to hear your opinions. Follow me on Medium to get updated on my latest articles. You can also connect with me on Linkedin and Twitter. Check out my blogs on Machine Learning and Deep Learning.

--

--