Linear Regression: Zero to Hero!

Spooky House? Wanna sell it? Well, linear regression can help you get rid of your house

U have a house and now you want to sell it because you come to know that there are some paranormal activities going in your house.

Now you want to set an appropriate price for your house so that you don’t face any loss.

You are smart, so you plot the prices of all houses (in your area) that were sold against the sqft of the house.

A naive approach to the house prediction problem

To get a price estimate you either will mark a boundary on the left and right-hand side of your house and try to come up with a price. This approach might work but in that case, you are probably not utilizing the entire data. The solution does not provide a holistic view.

You glance at the data and decide to fit a line that passes through the data. Of course, this line can be represented by

f(x)=W0+W1*x
Where f(x) is the estimated price
W0: intercept
W1: slope of the line

Okay so since now you have come up with a line, how can you be sure that it’s the best fit. There has to be a metric to measure that, right? Well, there’s such a metric. It’s called RSS or Residual Sum of Squares.

The RSS will be :
RSS=
(p1-f(x1))^2+ (p2-f(x2))^2+......
Where
x1,x2: Sq.ft of house
p1,p2: Actual house price
f(x): Predicted house price

But how on earth can anybody come up with a line that best captures the trend in the data? It’s a humongous task.

Well, there’s an algorithm called gradient descent. It does all the heavy lifting and provides you with the parameters W0, W1 of the line that best fits the data.

Gradient descent searches over all possible values of W0, W1 and finds one where the RSS is minimum.

Isn’t that awesome?

Gradient descent has a lot to explore, refer this video for more details.

You were happy and full of pride. You’ve learnt about RSS, used gradient descent and found the best fit line.

But your neighbour was watching you doing all these stuff behind your back.

He says that he feels that a quadratic function would have been better for this problem.

And it suddenly clicks you that quadratic function would, in fact, be a better fit for the given data.

Now a question will pop in your mind, how can you find out what order/complexity be for your regression model. How can you decide whether it's a linear, quadratic or cubic function?

How can anyone find it out?

It’s not rocket science. If you plot training and testing error against the model complexity, you might understand the trick.

A highly complex model will overfit the data and capture all the local trend. But in that process, it fails to generalise over the overall trend.

With increasing model complexity, the test error decreases up to a certain point and then it starts to increase.

A question might arise, how on earth can I choose which features to include so that my model performs better and doesn’t fall into the traps of overfitting? Hold on to this question #1 .


Before diving further, let’s understand the concept of concave and convex function.

Curve on left represents concave function. On the right,Convex

Concave:(Think of it,as a cave)- A line connecting any two points on the curve always stays below the curve.

Convex: A line connecting any two points on the curve will always stay above the curve.


This knowledge of convex and concave functions helps us to understand two important algorithm,namely hill climbing and hill descent.

To be continued…

Abhijeet Bhattacharya

Written by

Data Scientist and ML Enthusiast

Practical Programming

Get your hands dirty with the code!

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade