Understanding Polynomial Regression!!!

Published in

Analytics Vidhya

7 min readAug 2, 2020

In my previous articles we took an overview of Linear Regression and Logistic Regression.
Let’s see another algorithm in the Regression Family.

Content:

What is Polynomial Regression?
Assumptions of Polynomial Regression.
Why do we need Polynomial Regression?
How to find the right degree of the Polynomial Equation?
Math Behind Polynomial Equation.
Cost Function of Polynomial Regression.
Polynomial Regression with Gradient Descent.

What is Polynomial Regression?

Polynomial Regression is a form of regression analysis in which the relationship between the independent variables and dependent variables are modeled in the nth degree polynomial.
Polynomial Regression models are usually fit with the method of least squares.The least square method minimizes the variance of the coefficients,under the Gauss Markov Theorem.
Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.

A Quadratic Equation is a Polynomial Equation of 2nd Degree.However,this degree can increase to nth values.

Now,That you know what Polynomial Regression is.Let’s have a look of a list of assumptions because every regression analysis has it’s own assumptions.

Assumptions of Polynomial Regression:

The behavior of a dependent variable can be explained by a linear, or curvilinear, additive relationship between the dependent variable and a set of k independent variables (xi, i=1 to k).
The relationship between the dependent variable and any independent variable is linear or curvilinear (specifically polynomial).
The independent variables are independent of each other.
The errors are independent, normally distributed with mean zero and a constant variance (OLS).

Why do we need Polynomial Regression?

Let’s consider a case of Simple Linear Regression.

We make our model and find out that it performs very badly,
We observe between the actual value and the best fit line,which we predicted and it seems that the actual value has some kind of curve in the graph and our line is no where near to cutting the mean of the points.
This where polynomial Regression comes to the play,it predicts the best fit line that follows the pattern(curve) of the data,as shown in the pic below:

Polynomial Regression does not require the relationship between the independent and dependent variables to be linear in the data set,This is also one of the main difference between the Linear and Polynomial Regression.
Polynomial Regression is generally used when the points in the data are not captured by the Linear Regression Model and the Linear Regression fails in describing the best result clearly.

As we increase the degree in the model,it tends to increase the performance of the model.However,increasing the degrees of the model also increases the risk of over-fitting and under-fitting the data.

How to find the right degree of the equation?

In order to find the right degree for the model to prevent over-fitting or under-fitting, we can use:

Forward Selection:
This method increases the degree until it is significant enough to define the best possible model.
Backward Selection:
This method decreases the degree until it is significant enough to define the best possible model.

Math Behind Polynomial Regression!

If you know what Linear Regression is then you will probably understand the maths behind the polynomial regression too.
Linear Regression is basically the first degree Polynomial.
I hope the below image makes it clear.

Now, the value of b is found out by matrix multiplication.
For Multiple variable the matrix calculation is done by:

To get a better understanding of the math behind i suggest you to refer this link which explains the math clearly.

Cost Function of Polynomial Regression

Cost Function is a function that measures the performance of a Machine Learning model for given data.
Cost Function is basically the calculation of the error between predicted values and expected values and presents it in the form of a single real number.
Many people gets confused between Cost Function and Loss Function,
Well to put this in simple terms Cost Function is the average of error of n-sample in the data and Loss Function is the error for individual data points.In other words,Loss Function is for one training example,Cost Function is the for the entire training set.

So,When it’s clear what cost function is Let’s move on.

If you went through my article on Linear Regression ,you would know the cost function of Linear Regression.

Cost Function of Polynomial Regression can also be taken to be Mean Squared Error,However there will a slight change in the equation.

#Cost Function of Linear Regression
J = 1/n*sum(square(pred - y))Which, can also be written as :
J = 1/n*sum(square(pred-(b0 + b1x1)))  i.e, y = mx+b#Cost Function of Polynomial Ression
J = 1/n*sum(square(pred - y))
However,here the eqaution of y will change.So,the equation can also be written as:
J = 1/n*sum(square(pred - (b0 + b1x + b2x^2 + b3x^3.....))

Polynomial regression can reduce your costs returned by the cost function. It gives your regression line a curvilinear shape and makes it more fitting for your underlying data. By applying a higher order polynomial, you can fit your regression line to your data more precisely.

Now,We know that the ideal value of the Cost Function is 0 or somewhere closer to 0.
In order to get out ideal Cost Function,We can perform, Gradient descent that updates the weight which in return minimizes the errors.

Gradient Descent for Polynomial Regression

Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) of a function that minimizes a cost function (cost).

To Read more about it and get a perfect understanting of Gradient Descent i suggest to read Jason Brownlee’s Blog.

To update m and b values in order to reduce Cost function (minimizing MSE value) and achieving the best fit line you can use the Gradient Descent. The idea is to start with random m and b values and then iteratively updating the values, reaching minimum cost.

Steps followed by the Gradient Descent to obtain lower cost function:

→ Initially,the values of m and b will be 0 and the learning rate(α) will be introduced to the function.
The value of learning rate(α) is taken very small,something between 0.01 or 0.0001.

The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a cost function.

→ Then the partial derivative is calculate for the cost function equation in terms of slope(m) and also derivatives are calculated with respect to the intercept(b).

Guys familiar with Calculus will understand how the derivatives are taken.
If you don’t know calculus don’t worry just understand how this works and it will be more than enough to think intuitively what’s happening behind the scenes.

→ After the derivatives are calculated,The slope(m) and intercept(b) are updated with the help of the following equation.
m = m - α*derivative of m
b = b - α*derivative of b
Derivative of m and b are calculated above and α is the learning rate.

If you’ve gone through the Jason Brownlee’s Blog you might have understood the intuition behind the gradient descent and how it tries to reach the global optima(Lowest cost function value).

Why should we substract the weights(m and b)with the derivative?
Gradient gives us the direction of the steepest ascent of the loss function and the direction of steepest descent is opposite to the gradient and that is why we substract the gradient from the weights(m and b)

→ The process of updating the values of m and b continues until the cost function reaches the ideal value of 0 or close to 0.
The values of m and b now will be the optimum value to describe the best fit line.

I hope things above are making sense to you.

HAPPY LEARNING!!!!!

Like my article? Do give me a clap and share it, as that will boost my confidence. Also, I post new articles every Sunday so stay connected for future articles of the basics of data science and machine learning series.

Also, do connect with me on LinkedIn.