Linear Regression Algorithm

Apr 9 · 5 min read
photo by Isaac Smith on Unsplash

Linear regression is one of the most well-known and well-understood algorithms in statistics and machine learning. Before going to linear regression let’s understand what is Regression.

What is Regression?

Regression falls under the supervised learning category. The main goal of regression is the construction of an efficient model to predict the dependent attributes from a bunch of attribute variables. A regression problem is used when the output variable is either real or a continuous value i.e salary, score, weight, etc. It tries to draw the line that best fit from the data gathered from several points.

Common Types Of Regression

The following are common types of regression.

  1. Linear Regression
  2. Polynomial Regression
  3. Support Vector Regression
  4. Decision Tree Regression
  5. Random Forest Regression

What is Linear Regression?

Linear regression is one of the regression technique in which a dependent variable has a linear relationship with an independent variable. The main goal of Linear regression is to consider the given data points and plot the trend line that fit the data in the best way possible.

Let’s say we have a dataset that contains information about the relationship between X and Y. Number of observations are made on X and Y and are recorded . This will be our training data. Our goal is to design a model that can predict the Y value if the X value is provided. Using the training data, a regression line is obtained which will give the minimum error. This linear equation is then used to apply for new data. That is, if we give X as an input, our model should be able to predict Y with minimum error.

The linear regression model is represented by the following equation:

Linear regression most often uses mean-square error (MSE) to calculate the error of the model.

How Linear Regression works?

Let us consider that there’s a connection between how many hours a student study and marks; regression analysis can help us understand that connection. Regression analysis will provide us with a relation that can be visualized into a graph to make predictions about your data.

The goal of regression analysis is to create a trend line based on the data. This then allows us to determine whether other factors apart from hours of study affect the student marks, such as level of stress, etc. Before taking that into account, we need to look at these factors and attributes and determine whether there is a correlation between them. Linear Regression can then be used to draw a trend line which can then be used to confirm or deny the relationship between attributes.

How do we determine the line that best fits the data?

The line is considered best fit if the predicted values and the observed values is approximately same. In simple words, the sum of distance of data points from the line is minimum then it is a best fit line.

The Line is also called the regression line and the errors are also known as residuals which are shown below. It can be visualized by the vertical lines from the data point to the regression line.

error ,in this case, is the sum (mean or standard deviation) of the point from the line chosen.

Model Performance

After the model is built, We need to check the difference between the values predicted and actual data, if it is not much, then it is considered to be a good model. Below is a metric tool we can use to calculate errors in the model.

R — Square (R2) score:


Total Sum of Squares (TSS): The measure of how a data set varies around a mean. The TSS tells us the variation in the dependent variable.

TSS = Σ (Y — Mean[Y])2

Residual Sum of Squares (RSS): sum of the squared differences between the actual Y and the predicted Y. The RSS tells us how much variation of the dependent variable is not explained by our model.

RSS = Σ (Y — f[Y])2

(TSS — RSS) measures the amount of variability in the response that is explained by performing the regression.

R2 score can be used to check all regression model’s performance.

A simple Linear Regression Example:

import numpy as np
from sklearn.linear_model import LinearRegression
x = np.array([1, 2, 3, 4, 5, 6]).reshape((-1, 1))
y = np.array([2, 5, 6, 8, 9, 12])
model = LinearRegression(), y)
Y_pred = model.predict(x)
r_sq = model.score(x, y)
print('coefficient of determination:', r_sq)
plt.scatter(x, y)
plt.plot(x, Y_pred, color=’red’)

Linear regression project link:

Data Scientists must think like an artist when finding a solution

Sign up for AI & ART


A weekly collection of the best news and resources on AI & ART Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.


Written by

Pursuing 3rd year of computer science engineering. Machine learning enthusiast.

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store