On P-Values

Do Jelly Beans cause Acne?

Nuwan I. Senaratna

Published in

On Economics

4 min readOct 23, 2019

“Do Jelly Beans cause Acne?”

We ask some scientists to test this hypothesis. The scientists reply,

“We found no link between JBs and Acne (p > 5%)”

We understand the “no link between JBs and Acne” bit, but what is “p”? And what is the significance of “p > 5%.”

To understand “p”, we have to investigate how scientists use statistics to examine questions like “Do Jelly Beans cause Acne?”. They usually have “tests” to detect if (say) JBs cause Acne.

Tests

A “test” could consist of mixing the JBs with a mysterious chemical and seeing if a red precipitate results.

The outcome of the test has some connection with what you want to know. For example, if we mix JBs with a mysterious chemical and red crystals precipitate, we might conclude that JBs do indeed cause Acne.

On the other hand, if no red crystals appear, we might find that JBs do not cause Acne.

Now, most tests are not perfect. So the connection between the outcome of the test and you want to know might not be deterministic. There might be some noise.

For example, there might be some cases where JBs cause Acne, but the test is negative (i.e. No red crystals). And there might be some cases where the JBs don’t cause Acne, but the test is positive (i.e. Red crystals appear).

In Statistics-speak, the two types of cases are known as false negatives and false positives, respectively.

p-values

And, so, what is the significance of “p > 5%”

When a scientist says that the p-value of a test is greater than something (say 5%), she means that the chance that the test returns positive but should return negative is > 5%. Or in statistics-speak, the chance of a false positive is > 5%.

If we overlook this, we could misinterpret the results of our test.

For example, since the false positive rate for our JB-test is greater than 5% (or 1 in 20), if we repeat the test 20 times on safe JBs, we are likely to get one positive result (falsely) indicating that JBs cause acne.

Far too often, statistics, reported everywhere, from the media to even esteemed academic journals, suffer from this problem.

What about false negatives?

What was the false-negative rate of our JB-test? What was the probability that there is no red precipitate, but the JBs do indeed cause Acne? Why didn’t the scientists report this?

With most statistical tests, there is an implicit assumption that the negative state is more common than the positive state. For example, that most JBs do not cause Acne. In statistics-speak, this negative state is known as the “null hypothesis”.

For example, suppose that most JBs (say 99%) are safe, and 1% cause acne. Suppose we test 10,000 JBs. 9,900 of these are safe; 100 cause acne.

Since our false positive rate > 5%, of the 9,900 safe JBs, the test will return positive for > 495 samples, and negative for < 9,405.

Now suppose that our false-negative rate is very high. Let’s say that it is 50%. That is, for a JB that does cause Acne, there is a 50% chance that the test returns negative.

That means, for our 100% unsafe JBs, the test will return positive for 50 and negative for 50.

Let’s look at the performance of our test as a whole.

For 9,405 of the safe JBs and 50 of the unsafe JBs, the test was correct. For > 495 safe JBs and 50 unsafe JBs, the test was incorrect. The point to note is that 495 > 50. I.e. even if the false-negative rate is much higher than the false positive rate, if there are far more negatives than positives, then the absolute number of false positives will be considerably higher than the absolute number of false positives.

Assuming this, scientists often only report the false positive rate, by way of the p-value.

If the assumption “the negative state is more common than the positive state” breaks down, so does “the absolute number of false positives will be far greater than the absolute number of false positives”.

Just as p-values are ignored and misinterpreted, many analyses wrongly assume these assumptions.

Staying safe from Statistics

Always ask the following questions:

What is the false-negative rate? Many studies avoid reporting this. For example, that is the chance of a JB causing Acne if no red precipitate is observed?
What is the actual prevalence of positives and negatives? Are there more safe JBs than Acne causing JBs? Is “the negative state is more common than the positive state” assumption reasonable?

Usually, humans are far more likely to mislead than statistics. Hence you should more importantly also ask:

How does a positive outcome impact the scientist?
How does a negative outcome impact the scientist?
Would one or the other result in higher pay, more awards, or more prestige?