Linear Regression — Ordinary Least Square Method

6 min readAug 28, 2019

Summary: This the first blog of the new beginner friendly series — Basic Concepts You Should Know Before Starting with the “Neural Networks” (NN) covering Linear Regression using Ordinary Least Square Method. For more posts like these on Machine Learning Basics follow me here or on Twitter — Shaistha Fathima.

Basic Concepts You Should Know Before Starting with the “Neural Networks” (NN) Series

Bonus: If you want to look into some YouTube tutorial on NN using PyTorch, you may watch this playlist by Deeplizard!

What are we waiting for… lets begin!

Lets start with the most basic and most neglected topic…. Linear Regression!

So what is linear regression and why should we even bother about it?…

If I were to define it, Linear regression is a statistical method of finding the relationship between independent and dependent variables.

Ah! Okey…what does that mean?

To put it simply, you could just say, that its a procedure (not the right term!) to figure out a line best fit for the partition of the given data, say separating the maximum of red dots from the blue dots. This should look something like this,

linear regression separating red doubts from the blue dots

Ok…Now we know it could be used to segregate data based on our interests. So why use it? ….

Precisely because it helps segregate our data based on our interests!!… Ok ok …for those of you, who still can’t follow… linear regression methods could be used to find that mid point or information that could help understand the data and generalize it! Still not following?! Here are a few real life example where this comes in handy:

Example 1: Predict the sale of products in the future, based on past buying behavior.

Example 2: Economists use Linear Regression to predict the economic growth of a country or state.

Example 3: Sports analyst use linear regression to predict the number of runs or goals a player would score in the coming matches based on previous performances.

Example 4: An organization can use linear regression to figure out how much they would pay to a newly joined employee based on the years of experience.

Example 5: Linear regression analysis can help a builder to predict how much houses it would sell in the coming months and at what price.

Example 6: Impact of SAT Score (or GPA) on College Admissions.

Example 7: Petroleum prices can be predicted using Linear Regression.

Bonus: Here is an awesome read by Carolina Bento on Linear Regression In Real Life

Moving on…in general or to be more specific we will be discussing about two methods Ordinary least square method and Gradient Descent!

Ordinary Least Square Method

This is simple linear regression more like the picture shown above. It finds the line that minimizes all of the distance near the line of points, i.e., minimize the distance between the closest points and the line!

Lets go into an example question used by Udacity to explain this…

Question: Below image shows that Student 1 is accepted into the school when his/her Test score was 9/10 and Grades were 8/10, on the other hand student 2 was not accepted because of his/her poor test score 3/10 and Grades 4/10. This seems pretty simple, now what about student 3 who has a test score of 7/10 but grades 6/10, will he/she be accepted into the school?

Try to analyze the data given above and answer the question as Yes or No, and why do you think so?

To make it easy, here is the graph on Grades VS Test, also showing the points for student 1 and student 2 with other test results, with red dots showing rejection and the blue showing the acceptance into the school.

So which is it Yes or No?…

Lets think we found the line using linear regression separating the two data, which I suppose you might have already done, visually it should look like this,

line separating max of red portion with blue

Now, just by plotting Student 3 ‘s Grades vs test marks (7,6) on the graph, you may notice that the point lies in the blue area, hence, the student is accepted!

Yes, the student 3 is accepted as the point lies in blue area!

Coming back to Ordinary Least Square Method, this method helps, the line move closer to say the red points, thus decreasing the distance between them and help finding the perfect location for the point!

lets take a simple example,

In this graph, x1,x2,x3,x4 are the points and their distance from the points are d1,d2,d3,d4 respectively.

Ordinary Least Square Method, the distances are first squared to remove the differences i.e., make them all positive!

Thus, minimizing the distances as (d1)² +(d2)²+(d3)²+(d4)²+… and so on…

So, how do you know where the line will intercept at the y-axis ( though its not shown here it will)

You know the equation of the line:

y = mx + b where, x and y are the corresponding axis’s and m is the slope of the line and be is the Y-intercept, where, the line will intercept the Y-axis.

For many point y = b + mx1+mx2+mx3+...

In that case, the mean value for both x and y are taken!

Note: Ordinary Least Square Method works for “Linear data only” and not all. Used to check residual plots!

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

You make check the below two articles to understand it better!

Conclusion

That’s it for this post. In the next post of this series we will look at the Gradient Descent which actually comes in handy in case of random or residual data.