Linear Regression Intuition

Part 1/3 in Linear Regression

4 min readApr 30, 2019

This is the first part in a 3 part series on Linear Regression.

Part3/3: Linear Regression Implementation

Before you hop into the derivation of simple linear regression, it’s important to have a firm intuition on what we’re actually doing. With that being said, let’s dive in!

Let’s say a dear friend, Sally, is ready to sell her house and they’re coming to you for advice on how low or high they should price it. Being the shrewd friend that you are, you remembered her two neighbors sold their houses in the past month. You kindly ask for the square footage of the neighbors’ homes and what they sold for. Sally does some digging and finds the square footage and selling price of her neighbors’ houses.

House 1

Square footage: 1,500 ft
Price: $200,000

House 2

Square footage: 2,500 ft
Price: $300,000

You both pull out your TI-83 calculators :), plot the points, and end up with something like this…

Now that we have information on nearby houses, Sally mentions that her house is 1,800 square feet and should be priced accordingly. Y’all’s initial thought is to price it directly in the middle, but know you can do better! Shedding off the cobwebs, you remember from your childhood that the formula for a line is

F(x) = A + Bx

and given two points you can find the slope and y-intercept of the line. Using the housing data above, the line connecting these two points turns out to be…

F(x) = 100x + 50000

You and Sally both plug in her square footage and find the price of her home should be $230,000. Equipped with math power, Sally speaks with her real estate agent, but the agent mentions she’s actually underpricing her home. The agent then hands Sally a dataset of 100 sold houses with their square footage and selling price. Caffeinated with coffee, you and Sally dive in to see if her real estate agent is telling the truth. You start by plotting all the points and end up with a plot like this (yes, tiny houses are included :) )

We can see how using two points to plot a line won’t work in this situation. Instead, we must find a line that “best fits” all 100 points. When thinking about a “best fit” line — think one line that is closest to all points. Instead of trial and error, we can determine this best fit by minimizing a thing called the sum of squared errors.

Sum of Squared Errors

To start, we need to find an equation of a line that minimizes the distance between all data points (see example below).

One way to measure distance between the scattered points and the line is to find the distances between their Y values (in our case, sale price). Let’s say we use our line from earlier, F(x) = 100x + 50000, and want to see how accurate our previous function is for a 1,500 square foot house that actually sold for $300,000. Well, if we input a 1,500 square foot house into our earlier projection, it says we should have sold the house for $200,000 but in reality it sold for $300,000. A difference of $100,000! Point one for the real estate agent.

This difference, or error, in price is exactly what we need to do for the rest of the 99 data points. Once we do this for each point, we then add the errors together to measure our accuracy. More formally stated…

and to account for negative numbers, we square the errors…

Resulting in the Sum of Squared Errors.

Now that we’ve arrived at the definition of the Sum of Squared Errors, the last step in finding the best fit line is to minimize this number. To minimize the Sum of Squared Errors, it requires slightly more math but you’ll be up and running in no time. Just think — once you’re equipped with this power, you will be able to find the line of best fit by simply plugging numbers in!

Moving Forward

In the next part, we formally derive simple linear regression.

Part 2/3 in Linear Regression