My Machine Learning Diary: Day 14

Junhong Wang
2 min readNov 2, 2018

--

This is day 14 of my machine learning diary (MLD) series.

Day 14

Today I learned what classification is.

Classification

Unlike a regression problem, where the machine needs to predict a continuous value, a classification problem requires a discrete answer. For example, given an email, classify if it is a spam/not spam. Given a tumor size, classify if it is malignant/benign.

Can we still use linear regression?

Tumor Type Classification 1

In the chart above, given a tumor size, it is reasonable to predcit the tumor as malignant if it is greater than 0.5, and as benign if it is the 0.5 threshold. In this particular case, if the tumor size is greater than x, it will be classified as malignant. Well, it seems linear regression works fine so far. But what if there is another malignant sample with extremely large size?

Tumor Type Classification 2

Now the threshould value are shifted to the right, and the classification does’t look right anymore. Another problem with using linear regression in classification problems is that the output value could take any value greater than 1 or smaller than 0. The output value should be either 0 or 1.

The solution

First we need to tackle the problem of the output range. What we can do is to take whatever value we get from θᵀ𝓍, and plug it into logistic function (a.k.a sigmoid function). Logistic function is defined as follow:

Logistic Function Definition

And it looks like this in graph:

Logistic Function Graph

What’s so nice about this logistic function is that it converts any value into (0,1). If we get 0.7, we can interpret it as “there is 0.7 chance the output is 1”. So the prediction in that case would be 1.

That’s it for today.

--

--

Junhong Wang

I'm Junhong. I'm a Software Engineer based in LA. I specialize in full stack web development and writing readable code. junhong.wang