Newbie’s Guide to ML — Part 10

Moving on with building classifiers, the next classifier we’ll build is the naive Bayes classifier which, instead of Euclidean distance or information gain, uses conditional probability to find out which class a point belongs to. If you’re good with conditional probability you can skip this post and go straight to the next. If not, this post is a quick primer on conditional probability and Bayes’ theorem.


Probability

Probability is the extent to which an event is likely to occur. For example, when you flip a coin there’s a 1/2 chance that you’ll get a heads and there’s a 1/2 chance that you’ll get a tails. Since this is a fair coin, there’s 1 heads and 1 tails and thus 2 possible outcomes. Probability is the ratio of the outcomes you’re looking for to the total number of possible outcomes.

Let’s make this a little more interesting. Say you flip a fair coin 5 times and you’re looking for all heads. What’s the probability of getting 5 heads in a row?

So you could have outcomes like HHTTH, TTTTT, etc. How many such out comes are there? Since this is a fair coin there’s 2 sides to the coin and you’re flipping it 5 times. The number of possible outcomes is 2 to the power 5 which is 32. Of those, you are interested in just 1 outcome HHHHH. So,


Conditional Probability

Here’s an even more fun scenario. Suppose there’s a bag of 10 coins. Out of these 10 coins, 9 are fair coins and 1 is an unfair coin with both heads. What’s the probability of getting 5 heads in a row and it coming from a fair coin?

There’s two parts to this — picking a fair coin and then getting 5 straight heads. First part is picking a fair coin and the second part is getting 5 straight heads.

So what’s the probability of getting a fair coin? That’s 9/10.

What’s the probability of getting five straight heads, now that we’ve picked a fair coin? That’s 1/32, as we’ve previously calculated.

Back to the original question now. What’s the probability of getting 5 heads in a row and it coming from a fair coin? We multiply the two probabilities together. (9/10) x (1/32).

The probability would change if it were an unfair coin because with an unfair coin the probability of 5 heads would be 1 as both the sides are heads.

This is conditional probability. The probability depends on which coin you get. Let’s write this more formally.

We got 5 heads and we got a fair coin. That’s the left side. For this to happen, we first had to pick the fair coin out of the 10 coins. That’s P(fair). Once we got that, we flipped it for 5 straight heads. That’s P(5H | fair).

So the whole equation reads “the probability of getting five heads and a fair coin is equal to the probability of getting five heads given a fair coin multiplied by the probability of getting a fair coin in the first place”. The little vertical line is pronounced as “given”.

Let’s write this more abstractly.


Bayes’ theorem

Let’s reverse the scenario now. What’s the probability of having picked a fair coin if you’ve got 5 heads? Getting 5 heads seems hard. Are you sure you’ve got a fair coin? We’re trying to find P(fair | 5H). Let’s write this down.

If you compare it to the abstract version of the equation, we’ve just reversed everything. That is,

But wait, in set theory isn’t A ∩ B the same as B ∩ A? Yup. The two equations are equal. Let’s write that down.

Let’s move the P(B) to the other side.

This is called Bayes’ theorem. It’s used to calculate the probabilities of outcomes if we know the conditions that lead to that outcome. This forms the basis of Bayesian decision theory. In the next post we’ll look at the naive Bayes classifier


Citations

I borrowed my example from Khan Academy’s probability series. You can watch the series here: