Linear and Polynomial Regression

Palak Bhandari
6 min readApr 14, 2024

--

Regression is a supervised machine learning model that is used to predict a continuous value based on the labeled dataset (Labeled data means the dataset whose respective target value is already known). The algorithm learns from from the labelled datasets and maps the data points to the most optimised linear functions, which can be used for prediction on new datasets.

Regression, to simply put it, is basically finding the ‘best fit line’ that passes through the maximum number of data points in the training data as possible. Further we use this line to predict values for our new data point. This best fit line is also called regression equation. In case of linear regression we aim to find a straight best fit line (linear equation) and in case polynomial regression we find a curved best fit line (non linear equation).

There are various types of regression not just linear and polynomial, but here in this blog we will look at just these two. Like, if you have more than one feature based on which you want to make predictions, there is multiple linear regression rather than simple linear regression. When we have two classes and we need to do classification then there is logistic regression.

Fig. Linear & Polynomial Regression

A regression dataset has a set of features, that are called the independent variable, these are used to make predictions. Secondly it has a target column or the outcome which is the dependent variable that the model is trying to predict. So, after the model is trained and is ready, given certain new datapoints of the independent variable type the model will give a outcome of the dependent variable type with certain accuracy.

Linear Regression

Simple linear regression involves only one dependent and one independent variable. Multiple linear regression has multiple independent variable and one dependent variable. Linear regression is where the we have continous data with a linear pattern. If there is linearity (i.e., the dependent variable follow the independent variables linearly) in dataset then linear regression works well. In case of multiple regression, there is an assumption that the independent variables are not correlated, if there is any kind of redundancy then the model may overfit.

The equation for the best fit line, or the linear regression works on is:

Y(predicted) = (β1*x + βo) + Error value

where,

  • Y is the dependent variable
  • X is the independent variable
  • βo is the intercept
  • β1 is the slope

In case of multiple regression the equation becomes,

y = β0​ + β1*​X + β2​*X + ……… βn​*X

where β0 is intercept and β1, β2 .. are slopes.

The goal of the algorithm is to find the best Fit Line equation that can predict the values based on the independent variables. In regression set of records are present with X and Y values and these values are used to learn a function so if you want to predict Y from an unknown X this learned function can be used. In regression we have to find the value of Y, So, a function is required that predicts continuous Y in the case of regression given X as independent features.

Fig. Best fit line [Source]

Locating the best fit line is basically reducing the error between predicted and actual values to the minimum, as much possible. The one with the least error is the best line.

We use cost function to compute the best values in order to get the best fit line since different values for weights or the coefficient of lines result in different regression lines. So, basically changing the value of β1 and β0, to get the best value of β1 and β0. Cost function or also called loss function is nothing but simply the error or difference between the predicted value and the true value in the training dataset.

In Linear Regression, the Mean Squared Error (MSE) cost function is employed, which calculates the average of the squared errors between the predicted values and the actual values.

The iterative process of gradient descent is applied to update the values of β1 and β0​. This ensures that the MSE value converges to the global minima(function value is smaller than at all other feasible points), signifying the most accurate fit of the linear regression line to the dataset.

The idea is to start with random β1 and β0 values and then iteratively update the values, reaching minimum cost. A gradient is nothing but a derivative that defines the effects on outputs of the function with a little bit of variation in inputs.

Fig. Gradient Descent [Source]

By moving in the direction of the Mean Squared Error negative gradient with respect to the coefficients, the coefficients can be changed. And the respective intercept and coefficient of X will be if α is the learning rate.

Learning rate gives the rate of speed at which and where (step) the gradient moves during gradient descent. Setting it too high would make your path instable, too low would make convergence slow. Put it to zero means your model isn’t learning anything from the gradients.

Regression analysis helps in identifying the nature and strength of the relationship between the dependent and independent variables. Although this is sensitive to outliers, which can affect the results. It also assumes a linear relationship between the dependent and independent variables which if not true may result in poor accuracy. When the model is too complex and fits the training data too closely, it will result in poor performance on new data points. Linear regression is limited to continuous variables, and it may not be suitable for analyzing categorical or binary data.

Polynomial Regression

A simple linear regression algorithm only works when the relationship between the data is linear. But suppose we have non-linear data, then linear regression will not be able to draw a best-fit line. Simple regression analysis fails in such conditions. Polynomial Regression a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x.

It is also called the special case of Multiple Linear Regression in ML. Because we add some polynomial terms to the Multiple Linear regression equation to convert it into Polynomial Regression.

In Polynomial regression, the original features are converted into Polynomial features of required degree (2,3,..,n) and then modeled using a linear model.

Fig. Polynomial Regression Equation [Source]

Polynomial regression offers greater flexibility compared to linear regression as it can model non-linearly separable data. However, it requires some understanding of the data to select appropriate exponents, and poorly chosen exponents can lead to overfitting.

Fig Difference in Simple Linear and Polynomial Regression [Source]

Here is the sample code for linear and polynomial regression used on the simple salary dataset that will show you the difference in using linear and polynomial regression — GitHub.

In this basically, I have taken a dataset with position and salaries and the best fit line is found using regression in both linear and polynomial. And as you will be able to see the polynomial one would give better results than the simple linear regression model.

References

  1. https://www.geeksforgeeks.org/ml-linear-regression/
  2. https://www.almabetter.com/bytes/tutorials/data-science/regression-in-machine-learning
  3. https://medium.com/@hrishavkmr/linear-and-polynomial-regression-1d666e25b016
  4. https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb
  5. https://www.analyticsvidhya.com/blog/2021/04/gradient-descent-in-linear-regression/
  6. https://www.analyticsvidhya.com/blog/2021/07/all-you-need-to-know-about-polynomial-regression/
  7. https://github.com/Palak-Bhandari/Linear-vs-Polynomial-Regression/blob/main/Salary%20Prediction.ipynb

--

--

Palak Bhandari

Hey, I'm Palak, exploring the magic of learning and growth! 📖 📚 🌟