Machine Learning 101, Lesson 1: Univariate Linear Regression

6 min readJun 7, 2019

Let me begin with a problem statement first.

The following table shows a small part of housing prices dataset taken from https://www.kaggle.com/kennethjohn/housingprice.

Assume that a new house of size 1200 feet² is built in the same locality, what would be the expected price of the new house?

Machine Learning is a branch of computer science that deals with algorithms that become more and more accurate with training.

Here I introduce the simplest Machine Algorithm to solve the problem statement in hand: Univariate Linear Regression, using which we will see how housing prices of new houses can be predicted based on available data.

Some Terminologies

Let me first introduce some terminologies:

m: The number of rows in the table given above (also known as training examples as we are going to train our machine learning algorithm from these examples).

x: input variable, also known as a feature. Here, we have the size of houses as the only input variable.

y: The output variables, also known as labels. Here, our label is the price of houses.

I will be denoting the kth training example using

For example:

and

hypothesis function(h): This is the function generated by the machine learning algorithm. The better this function is, the better the prediction from the machine learning algorithm. I will discuss it in the coming sections.

Approach

So, now that the terminologies are defined, let me start with an overview of what Univariate Linear Regression is. If I plot the training set data on the x-y plane, it would look something like this:

If you observe, there seems to be a linear relationship between the price of the houses(target variable) and the size of the houses(feature variable).
All I need to do now is to find the straight line which fits the data most closely.

This process of finding the straight line between the target variable and the feature variable is called linear regression. Since I have only one feature variable, this is called univariate linear regression.

But how do I find a straight line which fits this data the best? There are infinite straight lines possible through even a single point. So, how do I go about it?

Hypothesis and cost function

The figure shows how machine learning is used to predict values

Since I am looking for a straight line, my hypothesis function becomes:

The only thing left now is to find the values of the parameters:

First of all, given particular values of these two parameters, I need to be able to calculate how good the hypothesis is. For this, I define another function called cost function or error function as follows:

This cost function is also called the mean squared error.

Let’s take a simple example and see how this works.

Suppose we have m = 4 data samples.

The values of

and the values of

Let`s find the cost using 3 different cost functions:

and

Using the formula of the cost function and substituting values,

the cost of the first hypothesis function is:

= 3.3968

the cost of the second hypothesis function is :

= 1.9768

the cost of the third hypothesis function is:

= 0.0668

Now that I can calculate the cost corresponding to any value of the parameters, how do I calculate the value of the parameters for which the cost is minimum?

Gradient Descent: the guiding light

Imagine that you are at some weird place from where you want to reach the lowest point. It is weird as the terrain around is rising somewhere and falling somewhere.

Photo by Cosmic Timetraveler on Unsplash

What do you do?

One approach is to find the direction in which the terrain is the steepest and take a small step in the same direction. Now standing there, you perform the same task. You keep taking a small step in the immediate steepest direction until you reach a point where, if you look around, each direction takes you above the point and not below, i.e. you have reached the lowest point.

What I described above is exactly what we call gradient descent.

Now, there are 2 tasks in this analogy:

a. Finding the direction where slope fall is the steepest.

b. Taking a small step (but how small?)

How do I do this with a function?

You differentiate the function at that point, this tells you how fast the function is increasing. You then shift the parameters in the direction opposite to it by a small value (the amount of shift is called the learning rate).

So, if you find the point in the cost function which has the minimum value, you will find the parameters for the best fit curve right?

So, given the problem of univariate linear regression where the cost function is:

There are two parameters we need to compute:

So, I start with arbitrary values of these parameters (generally (0,0)) and then update the values simultaneously using the following equations:

What I mean by simultaneously updating parameters is that I should not compute the change in one parameter and then substitute it directly in the cost function. I should compute new values of both parameters using the same cost function and then simultaneously substitute them in the cost function.

Here, α is the learning rate. Choosing the values of α is extremely important as it decides how fast your algorithm is going to converge. Choosing a small α might lead to slow learning and large α might even make the algorithm diverge.

Learning rate is too fast (α = 1), the algorithm diverges

The implementation

Now that the learning process is known, let me show you the python implementation. So, download the dataset from here as a CSV file. You don`t have to do a lot for that, just while saving the dataset using “save as” change the extension to .csv instead of .txt. In the same directory, create a python file (.py) and use the following code as the content: