Bayes’ theorem is one of the most fundamental theorem in whole probability. It is simple, elegant, beautiful, very useful and most important theorem. It’s so important that there is actually one machine learning technique based on Bayes theorem named “NAIVE BAYES”.
While there are a few existing online explanations of Bayes’ Theorem, my experience is that the existing online explanations are too abstract. So in this post I will try to explain Bayes’ Theorem as intuitively as possible.
Before staring off let’s give this theorem a nickname. Whenever I learn new theorem I come up with a nickname based on the applications of that theorem. It always better to give nickname to theorems based on what they refer to. For example: Pythagoras theorem can be referred as Distance Theorem. Similarly we can refer Bays’ Theorem as Evidence theorem or trust theorem. Let’s take an example of car alarm. Your trust on car alarm updates whenever car alarm goes off. If you encounter most of the time car alarm goes off because of basketball hitting or bicycle hitting or may be any other false threat your trust goes down. Simply you are updating your trust by looking at past evidences that cause the event. Bayes’ theorem is a kind of mathematical formula that let’s you find exactly how much you should trust your evidence.
So let’s understand Bayes theorem with small example
Here is a problem statement:
1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?
What do you think the answer is? Give it a try.
Most people will estimate the probability between 70% to 80%, which is widely incorrect.
Let’s reformulate the above problem statement.
100 out of 10,000 women at age forty who participate in routine screening have breast cancer. 80 of every 100 women with breast cancer will get a positive mammography. 950 out of 9,900 women without breast cancer will also get a positive mammography. If 10,000 women in this age group undergo a routine screening, about what fraction of women with positive mammographies will actually have breast cancer?
Now give it a try. Most people will find correct answer which is 7.8%. Why? Because humans are good with whole number rather than decimal and percentage.
Solution is as follows:
Out of 10,000 women, 100 have breast cancer; 80 of those 100 have positive mammographies. From the same 10,000 women, 9,900 will not have breast cancer and of those 9,900 women, 950 will also get positive mammographies. This makes the total number of women with positive mammographies 950+80 or 1,030. Of those 1,030 women with positive mammographies, 80 will have cancer. Expressed as a proportion, this is 80/1,030 or 0.07767 or 7.8%.
Bayes’ Theorem derivation using this example
- 1% of women have breast cancer (and therefore 99% do not).
- 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
- 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result)
Put in a table, the probabilities look like this:
How to read it?
- 1% of people have cancer.
- If you already have cancer, you are in the first column. There’s an 80% chance you will test positive. There’s a 20% chance you will test negative.
- If you don’t have cancer, you are in the second column. There’s a 9.6% chance you will test positive, and a 90.4% chance you will test negative.
So Question here is : what’s the chance that women really have cancer if we get a positive result?
- OK, we got a positive result. It means we’re somewhere in the top row of our table.
- The chances of a true positive = chance you have cancer * chance test caught it = 1% * 80% = .008
- The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504
The chance of an event is the number of ways it could happen given all possible outcomes.
Probability = desired event / all possibilities
The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).
So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.
We can turn the process above into an equation, which is Bayes’ Theorem. Here is the equation:
- Pr(A|X) = Chance of having cancer (A) given a positive test (X). This is what we want to know: How likely is it to have cancer with a positive result? In our case it was 7.8%.
- Pr(X|A) = Chance of a positive test (X) given that you had cancer (A). This is the chance of a true positive, 80% in our case.
- Pr(A) = Chance of having cancer (1%).
- Pr(not A) = Chance of not having cancer (99%).
- Pr(X|not A) = Chance of a positive test (X) given that you didn’t have cancer (~A). This is a false positive, 9.6% in our case.
- Prior: The original proportion of patients with breast cancer is known as the prior probability. Pr(A) in above equation is prior.
- Likelihood: Pr(X|A) in our equation.
- Evidence: Denominator in RHS is evidence using which we are updating our posterior belief.
- Posterior: LHS in above equation, i.e Pr(A|X). The final answer — the estimated probability that a women has breast cancer, given that we know she has a positive result on her mammography
If you ever find yourself getting confused about what’s A and what’s X in Bayes’ Theorem, start with p(A|X) on the left side of the equation; that’s the simplest part to interpret. A is the thing we want to know about. X is how we’re observing it; X is the evidence we’re using to make inferences about A.
Thanks for reading. Your valuable suggestions will help me improve my blog writing skills.