Need of Polynomial Regression

Satvik Tiwari
Koderunners
Published in
3 min readAug 1, 2019

Linear Regression is one of the most used techniques for fitting a straight line to a linear data and is given by:

where,

θ₀ is the bias term which is the y-intercept of line

x is the feature

θ₁ is the parameter for x

But what if our data is non-linear?

Let’s create some non-linear data.

Although our data is non linear, yet we can use a linear model to fit it. This can easily be done by adding the polynomial terms of the existing features to the available features and then training a linear model on the newly created set of features. This technique is called Polynomial Regression. It can be given as:

Polynomial Regression is often referred to as a special case of multivariate linear regression as the coefficients associated with features are still linear and the polynomial terms are just features.

The above polynomial equation can be written as:

where,

x₃, x₄, x₅ are the polynomial terms

Let’s consider a scenario in which we have a dataset of housing price prediction which has only two features of length & breadth and our goal is to train a model with high accuracy. In this case, we can create new relevant features such as area of the house by using polynomial regression. Polynomial Regression is really helpful when a dataset lacks good features.

Lets try to plot different curves on the above non-linear data.

  • Y = θ₀+ θ₁x₁ (linear)
  • Y = θ₀ + θ₁x₁ + θ₂x₁² (polynomial)
  • Y = θ₀ + θ₁x₁ + θ₂x₁² + θ₂x₁³ (cubic)

It turns out that adding different degree of polynomial terms results in different fit to the data. A quadratic term creates a curve with one hump and a cubic creates two humps one facing upward and other facing downward.
You must be thinking what degree of polynomial terms should we generally add?

Well, that depends on the type of data and kind of output you expect. We need to think before adding polynomial features as adding high degree polynomial terms sometimes may result in over-fitting i.e., our model might learn the noises in our data and fail to generalize on new data.

Thank you for reading this article. If you have any suggestions for the next topic of my blog you can mention it in the comment section below. Stay tuned for more machine learning stuff …. :)

--

--