Linear Regression — explained in simple terms!!

Yagnik Pandya
Analytics Vidhya
Published in
5 min readJan 24, 2021

In this article, we will try to understand linear regression in simple terms with some basic examples and try to know the math behind it.

Regression is a way to explain the relationship between a dependent variable (Y) and one or more explanatory variables(X). Here as the term consists linear, we obviously think of a line.
In basic sense linear regression can be thought of finding relationship between two things i.e. Dependent variable (y) and independent variable (X) using a straight line.

Some examples are sales prediction of store, Increase in price of some property, etc.

Simple Linear Regression — In simple linear regression there is only one independent variable(X) and based on that dependent variable (y) is predicted.

Multiple Linear Regression — In multiple linear regression there are multiple independent variables(X) and based on that dependent variable (y) is predicted.

In below data there is one independent variable (X) and y is dependent variable. The task is to identify, with future value of X what can be value of y.

In above plot red dots are the data points and as we need to find linear relationship, hence we have connected them with a line. It is well known that equation for a line is y = mX + c, which can also be written as Y=β0+β1X1.
where, β0 is C from original equation — i.e. intercept
and
β1 is m from original equation- i.e. slope

If this equation of line is completed, Y can be predicted for future x. Here from past few data, X and y are known. The aim is to find values of β0 and β1 to complete our equation.

For above data, linear equation would be Y=100X+50. This is what the linear regression algorithm would try to get. Now for any value of X we can predict Y.
But, would the real world data be this simple ? Obviously not. It would be like below or even more complex.

In above plots, a simple line passing through all points can not be possible. Hence, objective would be to find the “best fit line” that gives good value nearest to that point like in above plots.

There can be various best fit lines, but Linear regression models try to make line such that the vertical distance between the line and the data points (that is, the residuals) is as small as possible. This is called “fitting the line to the data” and will be our ‘best fit line’.

Cost Function :
The actual points and the prediction given by best fit line will have variation and this is cost occurred(loss) at that point. Cost occurred is the difference between actual point and predicted point (i.e. vertical distance). The total cost function for all points is given by :

The equation for this best fit line would be final equation of Linear regression. In previous example it was very easy to get equation Y=β0+β1X1, but here we need to find best value for β0 and β1.

Here, β0 is intercept, assuming that best fit line passes through origin it’s value will be 0 and the equation would be y=β1X1. ( however β0 can have any value but to understand easily we have taken 0).
Now β1 value should be such that we get minimum cost. Checking cost function values for different values of β1( slope ) and X for below 4 points.

(1,1), (2,2), (3,3), (4,4)

For each point we have X and y(actual). Using equation y=β1X1, y(predicted) will be calculated for different values of β1 to get cost occurred for each value of β1.

From above it is clear that β1 = 1 is the best value as predicted points are same as actual. In short for lowest value of cost function, optimal value of β1 can be obtained. To get the lowest value of cost function Gradient Decent algorithm can be used to get global minima.

Finally, we are all set to find linear equation using Gradient Decent to get relationship between two variables.

The same math is applied when there are more than one independent variable. i.e. More than one X. This would be multiple linear regression and equation for that would be :

Y=β0+β1X1+β2X2+…+βnXn.

Here also, all the coefficients of X would be obtained by using Gradient Decent, same as for simple linear regression. This coefficient values are generally the weights of that feature. In layman’s terms, it shows importance of that feature for predicting dependent variable(y).

Assumptions of Linear Regression :

It is clear is that linear regression is a simple approach to predict based on a data that follows a linear trend. We will fail if there is a curved data set. Also there are certain assumptions for when Linear Regression can be used. They are like :

  1. Linear Relationship
  2. Multivariate normality
  3. No multicollinearity
  4. Homoscedasticity

Finally, almost all regression algorithms have some similar math like linear regression. Linear regression is always a good first step (if the data is visually linear) for a beginner. It is definitely a good first learning objective!

I hope this article was helpful to get basic understanding of Linear Regression and math behind it. Please leave your queries if any below.

--

--