Polynomial Regression In Python. — ML For Lazy 2021

M Shehzen
Analytics Vidhya
Published in
6 min readJun 3, 2021

Polynomial regression is the case of regression when a simple line can’t fit all the data that well. What does that mean — It means that the input features and output features don't have a linear relation.

The simplest linear relation we know is — Y = mX + c, which happens also to be the equation of a simple line.

The data, in many cases, does not obey this equation, and we have to move or lean towards some other equations which are complex than the equation of the line. Certain data obey the Quadratic equation, and some may lean towards the cubic equations. We have to leave simple linear regression aside in those cases and make it more complex to fit this data type. Using these equations, we get the best fit line, not in the straight fashion rather in curved or say ‘S’ shaped Fashion.

We get a lot of curves as we increase the degree, which is another concept to grasp here. In short, like the move from quadratic to cubic to 4th power equations, we get more complex relations and more complex fits to the data which does not have linear, i.e., straight-line relations.

Some Important concepts.

Polynomial.

A polynomial is an equation in which we have some unknown variables, and the whole expression is the sum of small-small expressions in that variable. These variables have varying powers.

Degree of the Polynomial.

The degree of a polynomial is defined as the highest power of that polynomial. It is a simple concept, yet a compelling and important one while understanding the polynomial regressions and regression as a whole.

Say suppose, I have a polynomial — y = ax + bx² + cx³ + d

Then, The polynomial I have has the highest degree as 3, and it has the degree of three. In this way, we define the degree of the polynomial.

There is one important concept here, which is roots. It means the solutions to this polynomial.

The number of roots/solutions to any polynomial is equal to the degree of that polynomial.

This means, that quadratic equations have 2 roots, cubic has three roots and so and so on.

Some Common Graphs of Polynomials

What is Polynomial Regression?

Now that we have an idea about the polynomials and their degrees. How to know the polynomial degree and how many roots, i.e., solutions, does that polynomial have. Now it is time to know the Polynomial regression itself.

The great Wikipedia says,

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x.

Wikipedia.com

In simple terms, it means, when our data is not approximated using a simple line equation i.e. y = mx + c, rather we need a more complex equation with higher degrees. The above graphs are the result of an increase in degrees of the polynomial.

Why do we need Polynomial Regression?

Say suppose we have a data of the form -

Now clearly, if we draw a simple line through this data, such line would not do that great work in approximating the data points. Let have a look -

In such cases, where we have this kind of data and relation between independent and dependent data, we need to have more complex polynomials to make a better approximation. As an example like this, with a 2-degree polynomial.

How To Implement Polynomial Regression?

Now that we have seen how polynomial regression works and why do we need to do polynomial regression. We need to do now to implement it in code; as we have seen behind the scenes of the polynomial regression, now we are ready to write the code. When we write the code, we will understand every detail of how and why it is working and how it works in this way, how it works in this way, and why it works in this way. So let’s dig in. First of all, we need to import the libraries, and we will be importing NumPy Pandas and matplotlib, so let’s import them.

import numpy as np import pandas as pd import matplotlib.pyplot as plt

Now it is time for the data to be loaded into our environment. So let us do that using the pandas library.

path = 'path' data = pd.read_csv(path) data.head()

Our data looks like this -

The thing now to do is to get the input features and output features. We can simply get them, and then using dot(.)values convert them into numpy array. The thing here to notice is that, we need only the level column and the salary column.

X = data.iloc[:, 1:-1].values y = data.iloc[:,-1].values

Plotting the scatter plot looks like this -

Let us traion the simple regression model and see how the line gets fitted.

from sklearn.linear_model import LinearRegression model1 = LinearRegression() model1.fit(X, y) plt.scatter(X, y) plt.plot(X, model1.predict(X), 'r') plt.title("Linear Model")

ooo… This model doesn’t look that great and is definitely not the best fit.

The thing we need to do is to craete the polynomial feature of certain degree. That is simple and can be done using sklearn in this way -

from sklearn.preprocessing import PolynomialFeatures xpoly = PolynomialFeatures(degree=2) xpolyFeat = xpoly.fit_transform(X)

Now, if we look at the xpolyFeat, this feature vector has every data point — 1, n, n²

Now, let us see, what happens to the model, if trained on these feature-

polyModel1 = LinearRegression() polyModel1.fit(xpolyFeat, y) plt.scatter(X, y) plt.plot(X, polyModel1.predict(xpolyFeat), 'g')

This model is far far better than the simple linear regression model previously trained on the same data. If we go on increasiong the degree, we will get better and better fit to the data. Let us see how 3-degree looks like -

In this way, we get better and better fit.

Conclusions

In this way, we implement the Polynomial regression in Python.

So, we answered the questions about what is Polynomial regression and why do we need it. Now let us get up and get our hands dirty with the code and paste the Polynomial regression into the mind.

If You like this post and truly understood it, share it with your friends and enjoy the Learning.

Want to read my previous post, see this — How to generate MFCC from audio.

I am a Postgrad student from Kashmir in Computer Science. In these covid days, I turn towards spreading the information about machine learning which is my Passion and Future studies. The aim is to make people understand and understand the basic concepts of Machine and Deep learning myself, which are crucial to further success in this field.

Originally published at https://mlforlazy.in on June 3, 2021.

--

--

M Shehzen
Analytics Vidhya

I am student, Blogger and trying to teach and learn from others. Happy learning and Happy reading.