A Career in Data Science — Part 4 — Machine Learning — Naive Bayes
Suppose you are tested for a symptom-less disease that 1 in 100 people have. The test you take is 95% accurate. You are tested positive. What is the chance that you have the disease? If your answer matches what most people answer, as 95%.
It is not correct — and you’ll also understand, it’s not even close. The logic to this problem is mentioned in this post.
So, if you have been following my blog, this my fourth article on Machine Learning on Naive Bayes.
If you are here for the first time below is the link to my Introductory post and hope you enjoy the content on my blog and cherish this journey of learning.
Fun Fact :
Alan Turing developed Banburismus, a cryptanalytic process, which was used to help break the German Kriegsmarine Messages enciphered on Enigma Machine during the Second World War. The process called Sequential Conditional Probability was applied to infer information about the likely settings of the Enigma. However, in the 1940’s Bayesian statistics was held in disrepute by statisticians who favored Sampling and Frequency Statistics. I learned this from the book “The Theory that Would Not Die” by Sharon Macgrayne.
Bayes Rule describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person’s age.
An intuition of this might really help you understand better :
Earlier in this post on Naive Bayes, I presented a scenario, where, You were tested positive for a symptom-less disease. The probability estimates were provided and You had to calculate the probability of having the disease. This can be considered a subtle intuition level example on “Bayes Rule”. Understanding this logic shall help you understand the concept better.
Suppose, we test 10,000 patients. Out of these 10,000 patients; 100 of them have the disease, and 9,900 don’t have the disease.
When the 100 patients who have the disease are tested, 95 of them get positive as the result and 5 get negative. However, when the 9,900 patients who don’t have the disease are tested, we get a positive result for 495 patients and the remaining 9,405 get negative.
So there are 590 patients in total with a positive result. Out of these patients, 95 have the disease, and 495 don’t. If all you know is that, you have been tested positive, then your chance of having the disease is 95/590 = 0.161 or about 16 %.
This can also be explained in another manner, Let us approach the problem in a mathematical way :
We know that the probability of a patient to be tested positive is 1% and the probability of the patient to be tested negative is, 1 — Probability(tested Positive), which is 99 %.
When new information comes to light, which is that the probability of the test being accurate is 95%.
- We can infer that there is 95% probability that the patient tested positive will have the symptom-less disease & 5% probability that the patient tested positive will not have the disease.
- Similarly, the probability of a patient tested negative to have the disease is 5% & the probability of a patient tested negative to not have the disease is 95%.
Now, by the formula of conditional probability, the probability of an Event B occurring with respect to Event A is nothing but the product of the probabilities of these 2 independent events.
- So, the probability of having the disease when tested positive is : 0.01 * 0.95 = 0.0095
- The probability of not having the disease when tested positive is : 0.01 * 0.05 = 0.0005
- The probability of having the disease when tested negative is : 0.99 * 0.05 = 0.0495
- The probability of not having the disease when tested negative is : 0.99 * 0.95 = 0.9405
We have four possible scenarios, and you can check that these probabilities add to 1. But, we want to know the chance of having the symptom-less disease, given that you are tested positive.
So, out of these 4 given scenarios only 2 are possible, i.e. the 2 cases where the patient has the disease. However, to find an accurate probability measure and also to maintain the probability ratios of these 2 cases we can, Normalize them or equivalently, divide them by something, so that they now add to 1. The thing we should divide by, is the sum of the two. So our new probability of having the disease and being tested positive is :
In a more formal version, Bayes Theorem can be considered as :
Now, here’s where the word “NAIVE” comes in Naive Bayes
We shall be making a naive assumption here. Let’s look at the probability of the 2 events happening together. So, P(A & B) is nothing but P(A ∩ B), which is the product of P(A) and P(B). This only happens if the two events are independent. If they are not, then this is not true.
However, in Naive Bayes we will assume that our probabilities are independent. This, as I said, is a Naive and False assumption. But in practice, it works well and it makes our algorithm very fast.
Another formula I will use is the formula for Conditional Probability, these are 2 ways of writing P(A ∩ B), and this is the basis of our Bayes Theorem.
P(A | B) * P(B) = P(B | A) * P(A)
But the trick we’ll use here is to forget about P(B). Thus, this equation doesn’t hold true. So, Lets modify it a bit by introducing proportionality.
P(A | B) ∝ P(B | A) * P(A)
This will work very well, because in practice P(B) will cancel out, so the fact that these 2 are proportional will be very useful.
Now, we are ready to use our Naive assumption. Let us consider the same case, where you were supposed to find the chances of having the symptom-less disease, when you are tested positive. So, using the conditional probability rule that we just reviewed to write it as :
P (Disease | Tested positive) ∝ P( Tested Positive | Disease) * P (Disease) P (Disease | Tested positive) ∝ [(95/100) * (1/100)] * (1/100)
P (Disease | Tested negative) ∝ P( Tested negative | Disease) * P (Disease) P (Disease | Tested negative) ∝ [(99/100) * (5/100)] * (1/100)
Similar to the initial method, let us normalize the probabilities by dividing each by the sum of all the probabilities, to get the desired probability.
Didn’t I tell you? 95% is not even close to the answer. Well, Bayes Rule confirms it!
The next post will be on Support Vector Machine (SVM), I shall post the link to it, as soon as it is ready to be published.
Just as always I’d like to thank my readers for their valuable time & interest. As I conclude this post on Naive Bayes with one of my favorite quotes:
“Power resides where men believe it resides. It’s a trick. A shadow on the wall. And a very small man can cast a very large shadow.” ― Lord Varys, Game of Thrones