Linear Regression rey! 😛

Published in

ml-macha

3 min readAug 6, 2018

This is going to be small but pretty informative post, covering everything you need to know what “linear Regression” in machine learning is!

List of terms that’ll you’ll be learning, I’ll explain each one thoroughly in this post! 😬

Hypothesis
Cost function
Gradient Descent

I’ll try and make this post look less math related, with less or no equations, because who likes seeing scary equations with theta and sigma 😛

Hypothesis :

So, what exactly is our hypothesis ?

For that, lets first understand what “regression” means…

regression analysis is a set of statistical processes for estimating the relationships among variables

This obviously gives you a hint that relationship is continuous among the variables, next thing that comes to our mind is the way this linear relationship can be visualised …hmmmm, remember … y = mx + c ?

This is basically your hypothesis! Simple right ? Hypothesis is just a fancy work machine learning experts use to denote the output function.

To be more precise, hypothesis is denoted by h and x is your feature in your dataset for which you are predict values.

h = ax + b

Cost function:

For every hypothesis line, we need to check how accurately can it predict values given a different set of set that is not present in your training dataset.

So…how do you measure this accurateness?

This is done by finding the mean-square of the distance between the predicted value (from the hypothesis) and the actual value. (also called the mean-squared error)

Just sum these values over the training dataset and there you have, that is your cost function, but how will this help? :/

Most important step is to minimize this cost function over the parameters a and b in your hypothesis, this is what are goal is!

Gradient Descent :

Remember our goal was to minimize the cost function J over the parameters a and b.

So basically, what’s happening here is (alpha) is our learning rate…

you must be wondering what is learning rate ? whoaa!! is our machine finally “learning” ? 😛

Let’s understand what a learning rate is…

The learning rate basically lets you know how big a step we need to take to update our parameters in our hypothesis to eventually end at the global minimum of our cost function!

What if you reach the local minimum? 😧

Observe, our cost function is convex , which makes sure we will have just one point of minima and that point will surely be the global minima!

Back to Learning rate…

One obvious question is, “how do you choose your learning rate? 😐”

answer is simple, choose a learning rate such … that when kept fixed in your equation of hypothesis eventually will converge at the global minima.

What if you choose a learning rate that’s too big ?

Chances are it might converge and cross the global minima each time :/

That’s it ! the values of a and b at the global minima is your answer, use these values to construct your equation of hypothesis and you’re done! 😎

For those who know a little more math 😛 must be wondering why do I have to do this iterative method of gradient descent and not find the global minima of the cost function directly by differentiation?

answer is, yes you can do that, it’s actually a more optimised approach to this problem but gradient descent scales better for larger data sets and is thus more preferred!