Machine Learning Path (II)

Linear Regression — Hypothesis Function

Machine Learning Path Title (II)

Before entering this section, you should know that machine learning algorithms can be split into two parts, the supervised and unsupervised learning (there are also other kinds of algorithms, but just think of like this for now). If you haven’t read the first article of this series, it would be very recommended to briefly view it through.

Again, this article is the note taken by myself for the Machine Learning course on Coursera held by the professor in Stanford University, Andrew Ng. If you are very interested in the field of AI and Machine Learning, it is highly recommended to take the course! (It’s free except if you want to have the certificate. Well, I do want that certificate because it is very helpful!)

House Pricing Example Revisited

In the previous article, we know that when we draw the dataset that maps between the size and the price of the house (which is illustrated below), we can derive a magenta line which is the prediction from the size to the price of the house. This kind of problem which we know the exact dataset and it predicts continuous valued output is known as a regression problem.

House Price Prediction

So, in this section, I will talk about how we can derive the magenta line using the model called Linear Regression with One Variable (which is also called Univariate Linear Regression). When it comes to the naming of the term, you can somewhat guess that —

  • Linear, namely, a straight direct line. In mathematics, it is a kind of first-degree equation (so-called simple equation). When drawing on the plane with X-Y axis, it is represented as a straight line. For example, y = 3x + 4 is a simple linear equation which is first-degree, when you draw it on the 2D plane, it is a straight line that passed the point (0, 4) and (-4/3, 0).
  • One variable means that the model have only one feature in prediction, for the house pricing example, the only variable which can variate is the size of the house. In other words, different size of the house can predict to different kinds of price.

Linear Regression with One Variable

Exam Scores Prediction Example

Let us use another example to help demonstrate how linear regression worked. For instance, I have a dataset of the scores of midterm and final exam of the students. So, the problem is:

I want to train a model that help us predict students’ final exam scores using their midterm score.

Examine this condition, we have the dataset which means this kind of problem is supervised learning. Additionally, it has single feature called the midterm exam’s score. So we can now confirmed that we can try using the linear regression with only one variable to help us train the model and predict the result (which is the final exam score).

BTW, I have already implemented linear regression that solve this problem in JS version. You can view it in my CodePen.io or access my Maxwell-Codepens project collection on my GitHub. Source code is included in the repository, feel free to play it around ❤. Don’t worry if you do not know what is the basics of linear regression or how I implemented it. After all, this series of article aims to help you understand those processes as simple as possible.

Linear Regression with One Variable

Convention on Mathematical Notation

Before we dive into details of linear regression, let start introducing the mathematical convention first.

  • m is denoted as the count of the samples. If we have 50 students with their midterm and final scores, then we have m = 50
  • n is denoted as the count of the features. In this case, because we are predicting final exam’s score only based on midterm, then n = 1. Additionally, if we have more features such as predicting using the midterm exam, the duration of the time that student study in a day and gender of the student, then we have three features which is n = 3.
  • x is denoted as the input feature. In this example, we will refer to the data of the midterm exam as the input feature. If we have multiple features, for example — we have 3 features, then we can denote each of them as x_1, x_2, x_3 (x with subscript numbers.)
  • y is denoted as the output feature which is also the result of the input feature. In this example, we have the final exam scores as our output feature.
  • We usually refer to specific samples as the ith-sample. Suppose we want to get the first data of the student exam’s scores, then i = 1. So, according to the graph below, our first midterm exam’s score is 80, which is also denoted as x^1 (x with a superscript 1). Notice that it is not an exponential representation, it is just to tell us that we are referring to the 1st sample’s input feature. In a similar way, we can also refer to the 1st sample’s output feature as y^1 which have value 86.
Mathematical Notation

Hypothesis Function

We now know the elements of mathematical representation to those training samples. Basically, we will use something called a hypothesis function to help us predict the final exam result. You can think of as a function that maps between the input feature(s) to the output feature. So, in the field of machine learning, we are trying to use the input dataset to train a model which results a hypothesis function that can predict the output feature.

In this example, we have only one input feature, so in linear regression, our hypothesis function is just a straight line in 2D coordinate. Its representation will be like this:

Hypothesis Representation for Single Feature Linear Regression

where x_1 is the midterm exam’s which is the only input feature. θ_0 and θ_1 (theta subscript 0 and 1) are called the parameters of the hypothesis function.

So how does hypothesis function worked in linear regression. Suppose that we omit the first θ_0 term (which is θ_0 = 0) to simplify the hypothesis function into the product of the input feature with the parameter θ_1:

Assume that θ_0 = 0

Moreover, assume that we already have a dataset of midterm and final exam that looks like this:

Dataset of the students’ midterm and final

Now, choosing different value of θ_1 will result in different kinds of hypotheses. The illustration below shows that when θ_1 is equal to 1, then it will result a perfect hypothesis function, because it covers all of our sample dataset which results in 100% precision. (In practical case, it is rarely to achieve such kind of precision, this is just an example demonstrate how we choose hypothesis function.)

Hypothesis Functions

Machine Learning Process

So, obviously the main goal of linear regression is try to get the hypothesis function (which is a straight line) which fit out dataset well. This clearly gives us another direction, we might ask —

How do we find the parameter(s) θ of the hypothesis function?

The answer of the question is simple — implement machine learning algorithm to help us train our dataset and then generate the result of the hypothesis function. Of course, which lead us to another question at the same time —

What is the machine learning algorithm that helps us to determine the parameter(s) θ of the hypothesis function?
Machine Learning Process

Before diving into the details about algorithms, from the beginning of the article, we can now understand that the fundamental machine learning process is listed below —

  • Arrange the training dataset
  • Analyse the data and run a suitable machine learning algorithm
  • Algorithm will help us train the model which results in a hypothesis function
  • Generated hypothesis function can help us predict the result of the input feature(s) data

Our final piece is to introduce the algorithm, we will also discover something called the Cost Function which is also part of the algorithm in the next article. Hopefully, this article gives you an idea about what machine learning process is and the idea behind hypothesis function. Anyway, if you are more interested in this field, it is highly encouraged to take the course on Coursera by Andrew Ng. If you still have questions about what machine learning is, please check out my previous article.