Polynomial Regression: Machine Learning

TC. Lin
4 min readJan 3, 2023

--

This article continues from the previous: Linear Regression.

In this article, we will talk about Polynomial Regression.

And… it looks something like this:

This photo came from https://www.javatpoint.com/machine-learning-polynomial-regression

Some people refer to it as Polynomial Linear Regression, but wait… linear? Then why didn’t I include it in the previous article? Let’s don’t go to the debate part of it.

Don’t worry, let me explain…

Remember the formulas from Linear Regression? Well… formula for Polynomial Regression is similar.

This image came from https://www.superdatascience.com/

Very similar to the linear regression, the polynomial regression has only some minor changes:
- Instead of using different features of x, we take only 1.
- The feature we chose, we add a power to it, depending on the number of slope coefficient.

But why some people still refer to it as “Polynomial Linear Regression”?
Well, x and y might not have a linear relationship. But the linear refers to the slope coefficient. Ummhmm! Now you get it.

Using the following dataset as an example:
(In case you find it familiar, this dataset comes from https://www.superdatascience.com/ by Kirill and Hadelin)

The one feature that we are going to choose, let’s say x1, will be the column ‘Level’, and the dependent variable that we want to predict (y), will be the ‘Salary’.
- Note: The one feature does not mean one row, but one column, one feature.

Training a Polynomial Regression model

Up until this point, I bet it is not that hard to build a model right? Of course… because we have Scikit-Learn.

After importing the data like we did before, we want to train the entire dataset like so:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly_reg = PolynomialFeatures(degree = 2) # The power of x
X_poly = poly_reg.fit_transform(X) # Use only x1 - 'Level Column'

poly_lin_reg = LinearRegression()
poly_lin_reg.fit(X_poly, y)

With the help of Scikit-Learn, we first want to adjust our variable X to fit the equation of Polynomial Regression. But to begin, we add only the power of 2. We transform the X using PolynomialFeatures by setting a power to it, and apply it to our X. Then, we train our model like what we always do!

Visualizing our trained Polynomial model

Remember matplotlib? It is very useful for us to do visualizations.

Matplotlib has a recommended way of plotting by the way, it is the ‘Object Oriented Approach’. But for now, we won’t go into that.

Plotting a graph is simple, we do:

plt.scatter(X, y, color='salmon')
plt.plot(X, poly_lin_reg.predict(X_poly), color='lightblue')
plt.title('Position vs Salary (Polynomial)')
plt.xlabel('Position Level')
plt.ylabel('Salary');

We first make a scatter plot using our X & y values, then we plot a polynomial line of best fit from the transformed polynomial X variable that we did before, and passed our predicted y values to it. Simple as that.

Look! It’s easy, isn’t it?

Changing the power of X

The graph doesn’t look nice, let’s try applying the same codes as above, but making the power of X to 4, and we get…

This graph might seem to have an overfitting problem, but for demonstration purpose, that’s ok due to the number of samples that we have.

Imagine if we train this graph with a regular linear regression, we get a graph like below:

Now you see why it is important to use a suitable algorithm to train our model in order to have a better prediction!

Making predictions

From the same dataset, let’s try to predict the salary for position level 7.5 using our trained polynomial model.

Similarly, we first transform the feature to fit polynomial equation, then:

158862! It looks about right, it is a good prediction.

What if we use a regular linear regression model to predict it? Just like the graph of linear regression shown above.

lin_reg.predict([[6.5]]) # Input level
# If we input [6.5], it is a 1D vector
# [[6.5]] is 2D, 1 row, 1 column

And we get a value of 330378.
That’s way too off from what we expected.

Phew, that was a mouthful. Hope that this article is helpful, and you see the importance of picking a suitable algorithm for different types of data.

Next up, let’s dive into Support Vector Regression (SVR).

> Continue reading: Support Vector Regression

--

--

TC. Lin

Predicting the future isn't magic, it's artificial intelligence.