Polynomial regression

Published in

Coinmonks

5 min readJun 22, 2018

As told in the previous post that a polynomial regression is a special case of linear regression. As we have seen in linear regression we have two axis X axis for the data value and Y axis for the Target value.

Why use polynomial regression?

Well in the previous example as seen the data was kind of linear. So we got good fit line on the data. But considering Real world examples the data might not be so linearly but more scattered. In such cases linear regression might not be the best way to describe the data. A curved or non linear line might be a better fit for such data. For an example:

See this example the points are scattered/diversified. Thus a simple straight line might not be the best for such a data set.

So as we now know Why we should Polynomial Regression. Let us dive deep into how should we use it.

The equation of Quadratic Equation or polynomial of degree 2 is :

Polynomial of degree 2

Similarly a Equation of degree 3 :

Polynomial degree 3

Polynomial Degree n Would be like:

Where n is the degree of the polynomial

Now that we are done with the math lets focus on how we are gonna fit a data into polynomial equation.

For the same we are gonna use PolynomialFeature() function in the sklearn library with python.

from sklearn.preprocessing import PolynomialFeatures

So, how does PolynomialFeature () function work exactly?

It work is quite simple actually. It take a matrix of features and transforms it into a feature matrix of quadratic nature(in case of degree two).

lets say we have a matrix of two features

X=[[0,1],[2,3],[4,5]]

or,

Matrix

Now after we apply

poly=sklearn.PolynomialFeature(degree=2)
poly_X=poly.fit_transform(X)

What we actually get is a matrix in the form of [1, a, b, a², a*b, b²]

Here is the example code for the simple Polynomial Regression

Download code:

neelindresh/NeelBlog

NeelBlog - Contains the code and csv from my blog

github.com

So the Data Set I have is a simple one. It has basically two columns. List price and best price of a truck pick up company. So let us see.

CODE:
import pandas as pd
df=pd.read_csv(“/home/indresh/PycharmProjects/MLCoursera/DataSet/test.csv”)
x=df.iloc[:,0:1].values
y=df.iloc[:,1].values
from sklearn.preprocessing import PolynomialFeatures
poly=PolynomialFeatures(degree=3)
poly_x=poly.fit_transform(x)
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(poly_x,y)
import matplotlib.pyplot as plt
plt.scatter(x,y,color=’red’)
plt.plot(x,regressor.predict(poly.fit_transform(x)),color=’blue’)
plt.show()

Load CSV file

Get csv from :

neelindresh/NeelBlog

NeelBlog - Contains the code and csv from my blog

github.com

import pandas as pd
df=pd.read_csv(“/home/indresh/PycharmProjects/MLCoursera/DataSet/test.csv”)

Take X_axis and Y_axis

x=df.iloc[:,0:1].values
y=df.iloc[:,1].values

Note: Why use df.iloc [:,0:1] instead of simple df.iloc[:,0] the reason is than when we use df.iloc[:,0] is creates a 1D array which cannot be fitted into the polynomial model like [1,2,3,4,…n] . We need a 2D array to fit_transform() the X_axis data thus using df.iloc[:,0:1] creates a 2D matrix [[1,2,3,4,…n]].

Now what does .iloc[] do? Or better how does it work?

Ans: .iloc[] basically selects a row from a data frame. If we had said df.iloc[4] we would get the value of the forth row. Something like this:

List price 16.1
Best Price 14.1
Name: 4, dtype: float64

So when we are saying .iloc[:,0:1] is means .iloc[all the rows : of column 0]

Import the PolynomialFeatures from sklearn.preprocessing library

from sklearn.preprocessing import PolynomialFeatures

Now is the exciting part

poly=PolynomialFeatures(degree=3)
poly_x=poly.fit_transform(x)

So by PolynomialFeatures(degree=3) we are saying that the degree of the polynomial curve will me 3 (Try it for high value)

poly_x=poly.fit_transform(x)
Transform the array into polynomial form as I mentioned before the output will be
[ [1, x , x² ] ]
[[ 1. ,12.39999962 ,153.75999058]
[ 1. ,14.30000019 ,204.49000543]
[ 1. ,14.5 ,210.25 ]
[ 1. ,14.89999962 ,222.00998868]
Note: the array had only one value so [1,x,x²] if it had 2 values x,x1 the the matrix would be like [[1,x,x1,x²,xx1,x1 ²]]

This is same as the Linear model I described in the previous posts:

Linear Regression Part 1

Linear Regression is the simplest type of Supervised learning. The goal of Regression is to explore the relation…

dataneel.wordpress.com

from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(poly_x,y)
import matplotlib.pyplot as plt
plt.scatter(x,y,color=’red’)
plt.plot(x,regressor.predict(poly.fit_transform(x)),color=’blue’)
plt.show()