Classification, Sigmoid function

Published in

Machine Learning and Math

5 min readAug 16, 2018

Forward

There many problems in machine learning area. Simply the problems can be classified into two categories, classification, and regression.

predicting it is a cat, dog or some other animals by an image is the classification problem
predicting house price and temperature are regression problem

Scalar Prediction — Regression

I have a story Regression, Mapping, Matrices, which analysis the regression problem. It told that the nature to solve regression problem is to find a mapping which can input the observed information and output the predicted target and make the predicted target closed to the real target value as much as possible.

Compared to that story, we have these new content:

in binary classification, how to represent the class information in machine learning
the sigmoid function

Class Prediction — Classification

There are many classification problems in our life. such as :

by reading the email content to spam it or not, with this technology the spam email won’t disturb us as much as before. in this problem, there are only two classes spam email and not spam email which is called a binary classification problem.
classify the news into the different categories, such as political news, tech news, history news, car related news, entertainment news etc. If the news can be classified automatically, they can easily distribute to the different category in the news app. This kind of question is the multi-class problem.
recognize the object in an image, the result can be the cat, dog, car, tree etc. after getting the information of the image, we can build a lot of logic based on that. for example, search image by language. This problem is also a multi-class problem which involves a lot of objects.

Binary Classification

The multi-class problem can be transferred to multiple binary class problem, for the new classification, there are many categories, which can be transferred to many binary class problems, is it political news? is it tech news? etc. Although the multi-class problem can be solved directly. Here we will talk about the binary classification problem which is easier than the multi-class problem.

Taking spam email as an example, the first thing is to describe our problem with mathematics language

the input of the classifier is the email content which can be represented as some numbers. we can have numbers: 1) the length of the email content 2) how many links does the mail contain 3) define a wordlist, [”trash”, “duck”, “pick”, …], you can modify the example wordlist as you want, the third number is how many wordlists words in the email?etc. k numbers can represent an email, a matrix of shape m×k can express m email information. so we only need a matrix of shape e m×k and the meaning of each dimension.
numbers are needed to represent that it is spam email or not. number a can represent that the email is spam, and the number b can represent that the email is normal. the number a and b can be any number as long as a!=b. the number of a, b doesn’t affect the classifier performance. but at the convention, we will use number 0 and 1. a = 1, b = 0. because in computer science 1 means true, and 0 means false.

so as same as the regression problem, we need two matrices:

matrix X of shape m×k, and the k dimensions’ meaning
matrix Y of shape m×1, which tell us this email is spam or not.

Classify mail content as spam or not

The mapping in story Regression, Mapping, Matrices can be also used here. we have a coefficient matrix of shape k×1,

The coefficient matrix can transfer every k dimension which represents the email content to a number which represents the result. The problem of using the coefficient matrix is that we want a target number 0 or 1, but the result here is from －∞ to ＋∞. In order to make the result be closed 0 and 1, the researcher invented a sigmoid function which can mapping any number z to a new number between 0 and 1. The sigmoid function definition is:

The below curve demonstrates the function.

With sigmoid function, the whole mapping become, after applying multiplication between k dimension email information and coefficient matrix, we get a number z which range is (－∞, ＋∞), then the z is passed to sigmoid function, we will have a number between 0 and 1. for m emails, we will have the final result, y’

The process can be written with matrix operation:

Z = X ×w

The two equations can be merged into one

Y’ = S(X×w)

X is a matrix of shape m×k, w is a matrix of k×1, then Y’ will be the matrix of shape m×1 which is the classified result it is spam email or not

As same as in the story Regression, Mapping, Matrices, by comparing the predicted result and the real results, we will know how well our coefficient is. The real result is noted:

then the performance of the coefficient can be expressed as cost. the cost is smaller, the performance is better.

Summary

If we use matrix operation to simplify the cost calculation, it can be, cost = 𝐼×|S(X×w)➖Y|, 𝐼 is a unit matrix of shape 1×m. S is the sigmoid function which makes every element in the matrix from range (－∞, ＋∞) to the range (0, 1). In the equation, 𝐼, X and Y are the given number. The easy to find the best w to make our cost as small as possible is to try all of the possibility. Then you will find the one which can make cost smallest so that the classifier will recognize the spam email at their best.

Another way is to differentiate cost with respect to w, the result will tell you how to change w to make cost smaller a little. You can choose a w randomly and optimize it in this way. You will find the best w to make cost smallest, but sometime you will be failed. This is the optimization content, I can explain it later.