Classification and the Gaussian distribution

Qiang Chen
Machine Learning and Math
4 min readAug 12, 2018

Foreword

Machine learning can be started in many ways, the popular one is from what Andrew Ng taught at Stanford which can be accessed on coursera.org. The input of machine learning is data and the data has an aspect of randomness in nature, so probability language can be used to describe our data and resolve the machine learning problem. I made an example of determining gender by the hair length to demonstrate how to solve classification problem using probability.

Gaussian distribution

Men and women can have hair at the different length due to many reasons. but overall, women always have longer hair than men, how can we describe the truth exactly in math language? Basically, we can assume the distribution of women’s hair length and men’s hair lengths are Gaussian distribution and the women’s mean hair length is larger than men’s. μ₀ and μ₁ can be used to describe the women’s mean value of the distribution and men’s respectively. 𝜎₀ and 𝜎₁ can be the two’s standard deviation.

The probability density function for women’s and men’s hair

So the women’s hair length distribution probability density function is

women’s hair length distribution probability density function.

The men’s can be written in a similar way.

Formalize the classification problem

The classification problem: how to predict the gender by given the person’ hair length 𝑥₀. The problem can be described by probability language, P(Y=0|X=𝑥₀) and P(Y=1|X=𝑥₀), which one is the larger one. Y=0 means the person is women, Y=1 means the person is men.

For P(Y=0|X=𝑥₀), it can be expanded by the conditional probability definition :

The above process ignore the P(X=𝑥₀) because when comparing the P(Y=0|X=𝑥₀) and P(Y=1|X=𝑥₀)), both of them has the same denominator P(X=𝑥₀).

For P(Y=1|X=𝑥₀), it can also be expanded in the same way

In order to compare the two, division is helpful. A variable tcan be got by dividing one by one,

If t > 1, we know it is a woman, otherwise it is a man.

Training process

Based on the training data for women’s hair length and men’s, maximum likelihood estimation method is used to estimate the women’s hair gaussian distribution arguments and men’s. Then the t can be calculated by given the two Gaussian distribution.

P(X=𝑥₀|Y=0) = 𝚫𝑥 f₀(𝑥₀), 𝚫𝑥 is a very small number, by multiplying 𝚫𝑥 and the probability density we can get the probability that women wear around 𝑥₀ length hair. In the same way we have P(X=𝑥₀|Y=1) = 𝚫𝑥 f₀(𝑥₀).

Therefore

I will give an example to demonstrate how to train our classifier.

The people’s hair length training dataset

By the maximum likelihood estimation method, the argument for each Gaussian distribution can be resolved.

The men’s hair distribution argument can also be calculated by the given dataset.

The remaining probability

Then for each 𝑥₀, we can calculate t to determine its gender.

tis a unary function about 𝑥₀. for better understanding the function, I draw the picture. From the picture, we can see that when 𝑥₀ is larger than a number between 12.5 and 15, the t is always larger than 1, it means the person is always women.

The t function which is about 𝑥₀. The point across t = 1 and t function. In the image, y means t, x means 𝑥₀.

More analysis of the problem

Based on these, we can analyze the error classifying as men when given a women’s hair length and the error classifying as women when given a men’s hair length. in order to describe one of the two error, the t function can be adjusted. For this stuff, you can refer the content in the book pattern recognition, which explains these concepts clearly and comprehensive.

Reference

  1. https://en.wikipedia.org/wiki/Maximum_likelihood_estimation
  2. https://book.douban.com/subject/3996242/
  3. https://book.douban.com/subject/1119445/

--

--