Classification and the Gaussian distribution
Foreword
Machine learning can be started in many ways, the popular one is from what Andrew Ng taught at Stanford which can be accessed on coursera.org. The input of machine learning is data and the data has an aspect of randomness in nature, so probability language can be used to describe our data and resolve the machine learning problem. I made an example of determining gender by the hair length to demonstrate how to solve classification problem using probability.
Gaussian distribution
Men and women can have hair at the different length due to many reasons. but overall, women always have longer hair than men, how can we describe the truth exactly in math language? Basically, we can assume the distribution of women’s hair length and men’s hair lengths are Gaussian distribution and the women’s mean hair length is larger than men’s. μ₀ and μ₁ can be used to describe the women’s mean value of the distribution and men’s respectively. 𝜎₀ and 𝜎₁ can be the two’s standard deviation.
So the women’s hair length distribution probability density function is
The men’s can be written in a similar way.
Formalize the classification problem
The classification problem: how to predict the gender by given the person’ hair length 𝑥₀. The problem can be described by probability language, P(Y=0|X=𝑥₀) and P(Y=1|X=𝑥₀), which one is the larger one. Y=0 means the person is women, Y=1 means the person is men.
For P(Y=0|X=𝑥₀), it can be expanded by the conditional probability definition :
The above process ignore the P(X=𝑥₀) because when comparing the P(Y=0|X=𝑥₀) and P(Y=1|X=𝑥₀)), both of them has the same denominator P(X=𝑥₀).
For P(Y=1|X=𝑥₀), it can also be expanded in the same way
In order to compare the two, division is helpful. A variable tcan be got by dividing one by one,
If t > 1, we know it is a woman, otherwise it is a man.
Training process
Based on the training data for women’s hair length and men’s, maximum likelihood estimation method is used to estimate the women’s hair gaussian distribution arguments and men’s. Then the t can be calculated by given the two Gaussian distribution.
P(X=𝑥₀|Y=0) = 𝚫𝑥 f₀(𝑥₀), 𝚫𝑥 is a very small number, by multiplying 𝚫𝑥 and the probability density we can get the probability that women wear around 𝑥₀ length hair. In the same way we have P(X=𝑥₀|Y=1) = 𝚫𝑥 f₀(𝑥₀).
Therefore
I will give an example to demonstrate how to train our classifier.
By the maximum likelihood estimation method, the argument for each Gaussian distribution can be resolved.
The men’s hair distribution argument can also be calculated by the given dataset.
The remaining probability
Then for each 𝑥₀, we can calculate t to determine its gender.
tis a unary function about 𝑥₀. for better understanding the function, I draw the picture. From the picture, we can see that when 𝑥₀ is larger than a number between 12.5 and 15, the t is always larger than 1, it means the person is always women.
More analysis of the problem
Based on these, we can analyze the error classifying as men when given a women’s hair length and the error classifying as women when given a men’s hair length. in order to describe one of the two error, the t function can be adjusted. For this stuff, you can refer the content in the book pattern recognition, which explains these concepts clearly and comprehensive.