Fooled by conditioning
It’s a well-known fact that our intuition about probability is often dead wrong! The expectation values for a binary classifier, such as a disease screening, is one such example.
This is the first story in a series about conditioning, Bayes’ formula and the Bayesian interpretation of probability.
A screening example
A screening for a disease gives an answer to the question “Does the patient have the disease?” A “yes” is called a positive and a “no” a negative.
Consider now a specific screening for a disease which occurs in 1% of the population. At first glance, it looks like a rather good estimator:
- For patients who actually have the disease, the screening gives a correct result 80% of the time.
- For patients who actually does not have the disease, the screening gives a correct result 95% of the time.
Gut feeling
Now, if you are screened and get a positive result, then what is the probability of you actually having the disease? What is your gut feeling?
If you’re anything like the author of this article, your immediate answer will be 80%. Smarter people than me have been similarly fooled. Fooled, because indeed it turns out that the true answer is much lower than most expect!
Some terminology
It turns out that the 1% occurrence rate of the disease given is actually rather crucial for the answer to the question! This is known as the prevalence of the disease. The other two probabilities in the problem have technical terms as well:
- The probability of a correct screening for a patient who actually has the disease is known as the sensibility. In our example the sensibility is 80%. (Sensibility is also referred to as recall).
- The probability of a correct screening for a patient who actually does not have the disease is known as the specificity. In our example the specificity is 95%.
Note that both of these are actually conditional probabilities. Hence the title of this story.
A representative sample
In order to clearly see why the gut feeling above is wrong, let us consider a representative sample of 10,000 people. Since only 1% of the population actually has the disease, this means that the 10,000 can be divided into two groups:
- 100 people who actually have the disease.
- 9,900 people who actually do not have the disease.
False negatives and false positives
Since the sensitivity of the test is 80%, 20% of the time a person who actually has the disease is screened the result is incorrect, i.e. the result is negative. This is known as a false negative.
Similarly, since the specificity is 95%, 5% of the time a person who actually does not have the disease is screened, the result is incorrect, i.e. the result is positive. This is a false positive.
A false positive is also known as a type I error, and a false negative as a type II error.
- The rate of type I errors (20% in our case) is denoted by the Greek letter alpha.
- The rate of type II errors (5% in our case) is denoted by the Greek letter beta.
The true positives, false positives, true negatives, and false negatives are often summed up in a confusion matrix.
What does this mean for our sample?
Back in our sample, there’s 100 people actually having the disease. Of these 80 correctly test positive, while the remaining 20 are false negatives (or type II errors).
Of the 9,900 people actually not having the disease 95%, which 9405 individuals correctly test negative. The remaining 5%, which is 495 individuals, are false positives (or type I errors).
So because the second groups is so much larger because of the low prevalence, the number of false positives vastly outnumbers the true ones. Herein lies the essential problem.
Finally, an answer
Now we are ready to answer the question: Given that the screening is positive, what is the probability of actually having the disease?
First, how many of 10,000 test positive? The true positives number 80, while there are 495 false negatives. So 80+495=575 in total.
This means that the probability of actually having the disease is 80/575, or about 14%. Much lower than 80%!
Conclusion
The prevalence of diseases are usually low, which means that the number of false positives is very often much larger than the number of true ones. This implies a low predictive power with regard to revealing the occurrence of the disease in a patient.
Next time, we’ll take a deeper look at conditioning, and see how it leads to Bayes’ theorem.