# Part 1: Maximum Likelihood Estimation in Simple Terms

I have spent hours and hours to understand the concept of MLE and here is what the simplest explanation I would provide. Let us assume we have 5 countries with a population of 100, 200, 300, 150, and 450 people.

Out of 100 people in country1, let’s say we have 50 men and 50 women. In country2, let’s say we have 30 women and 170 men. In country3, let’s say we have 120 women and 180 men. In country4, let’s say we have 40 men and 110 women and In country5, let’s say we have 200 men and 250 women.

The probability of picking a woman from country1 is 0.5 and probability of picking a man from country1 is 0.5. Likewise, probability of picking a man from country2 is 17/20 and picking a woman from country 2 is 3/20. The probability of picking a man from country3 is 6/10 and probability of picking a woman from country3 is 2/5. The probability of picking a man from country4 is 4/15 and probability of picking a woman from country4 is 11/15. The probability of picking a man from country5 is 4/9 and probability of picking a woman from country5 is 5/9.

Given a distribution what is the probability that you choose a man is different from what is the probability that a given man comes from a particular distribution.** The latter is a Machine Learning Problem**.When we use MLE, we approximate the latter to the former.

Now, the question is if I give you a sample of 5 people(M,M,M,W,W), will you be able to say which country do they belong to? **This is a classification Problem. **Which means, you need to decide to which distribution or country you are going to classify those 5 people into. That’s where we use MLE, to figure out how likely it is that they come from country1 or country2…..country5.

Let’s denote those five people as D={x1, x2, x3, x4, x5} and we also know that x1 = M, x2 =M, x3 = M, x4=W and x5=W. Hence D:{x1=M, x2=M, x3=M,x4=W,x5=W}. Assuming the given sample comes from distribution 1, total probability equals p(x1=M) * p(x2=M) * p(x3=M) *p(x4=W)*p(x5=W) which equals (0.5)*(0.5)*(0.5)*(0.5)*(0.5) = **0.03125**. Assuming the given sample comes from distribution2, the total probability equals p(x1=M) * p(x2=M) * p(x3=M) *p(x4=W)*p(x5=W) which equals (17/20)*(17/20)*(17/20)*(3/20)*(3/20) = **0.0138**. Assuming the given sample comes from distribution3, the total probability equals p(x1=M) * p(x2=M) * p(x3=M) *p(x4=W)*p(x5=W) which equals (6/10)*(6/10)*(6/10)*(2/5)*(2/5) = **0.03456**. Assuming the given sample comes from distribution4, the total probability equals p(x1=M) * p(x2=M) * p(x3=M) *p(x4=W)*p(x5=W) which equals (4/15)*(4/15)*(4/15)*(11/15)*(11/15) = **0.01019** Assuming the given sample comes from distribution5, the total probability equals p(x1=M) * p(x2=M) * p(x3=M) *p(x4=W)*p(x5=W) which equals (4/9)*(4/9)*(4/9)*(5/9)*(5/9) = **0.02709**.

Out of all the probabilities, the maximum probability is 0.03456 which means those 5 people are more likely to belong to country3. Now, what is the probability that I draw those 5 samples {x1=M, x2=M, x3=M,x4=W,x5=W} from country3? It is the probability of drawing 3 men and 2 women from country3 which equals (6/10)³ * (2/5)² = **0.03456.**

**Example2:** Let us assume, I tossed a coin 4times in front of you and you could see 2heads. Let us also assume that there are 3 different biased coins and the probability of seeing heads when you toss those 3 different biased coins is as follows. P(H|C1) = 0.2, P(H|C2) = 0.3, P(H|C3) = 0.1. **Question1: Now** **will you be able to tell which coin I have chosen out of those 3 coins and tossed 4times in front of you?** Let us use MLE here.

The total probability of seeing 2heads and 2tails for coin C1 = P(H|Cx)*P(H|Cx)*P(T|Cx)*P(T|Cx) equals (0.2)*(0.2)*(0.8)*(0.8) = (0.2)² *(0.8)²= 0.0256. The total probability of seeing 2heads and 2tails for coin C2 = P(H|Cx)*P(H|Cx)*P(T|Cx)*P(T|Cx) equals (0.3)*(0.3)*(0.7)*(0.7) = (0.3)² *(0.7)²=0.0441.The total probability of seeing 2heads and 2tail for coin C3 = P(H|Cx)*P(H|Cx)*P(T|Cx)*P(T|Cx) equals (0.1)*(0.1)*(0.9)*(0.9) = (0.1)² *(0.9)² =0.0081. Hence the coin chosen was coin2.

**Question2:what is the probability that you see 2 heads and 2 tails when I toss C3 4 times ?** Answer: Probability of 2 heads and 2 tails which equals 0.3*0.3*0.7*0.7 = 0.0441. **The former is a machine learning problem, as we are asked to predict the coin seeing the data. Hence more the data I give to you, It would be more likely that you predict the correct distribution.**This is how MLE is related to Machine learning.