Listen to this story
There’s a secret world you’ve probably never heard of. It lies underneath every claim you’ve ever read, every headline that has blared “science!” at you as you walked past. This world is why you’ve been vaccinated, why you take ibuprofen for pain and inflammation, and even why you can’t get contraceptive pills over the counter in most countries.
I’m talking, of course, about the world of statistics.
Imagine you’re doing a study on weight loss. You’ve got, say, three groups of people: 1, 2, and 3. Group 1 is taking a drug. Group 2 is having a lifestyle intervention. Group 3 is a placebo control — they’re doing nothing. At the end of the study, you have hundreds, even thousands of numbers. You can compare them pretty easily — have a look at the means and medians — but all this really tells you is that there are some differences. They might be unimportant; they might be just down to chance. What if you had one person in Group 2 who started at 400 kilos and lost 300, but the rest of the people actually gained a couple kilos? You’d probably have an overall weight loss, but really your lifestyle intervention is working for just one guy.
What statistics basically allow you to do in medicine is compare results from different groups and tell whether the variation is likely due to your experiment or just chance. So you compare your three groups of people again and find that while Group 2, on average, lost weight, it wasn’t statistically significant because it was all due to that one guy and his mammoth effort.
Now, I’ve just used a very important phrase: statistically significant. This is basically the bar we set for research results. If they get above the bar, they are considered to be most likely due to the thing you’re examining — in this case, the drug or the lifestyle intervention. Basically, that the treatment worked.
If they don’t reach the bar, we say that any results we saw are probably due to chance, and the treatment didn’t work.
It’s a pretty important bar.
And, you’ll be surprised to hear, it’s completely arbitrary.
What Is Significant?
When we run a statistical test, we usually come out with what’s known as a probability value, or p-value. This is a number between zero and one that gives us an indication of how likely it was that the result we observed in our experiment was due to chance. A high p-value means any difference between groups was probably a fluke; a low p-value means we might be on to something here. Usually, a p-value below 0.05 means your results are statistically significant.
In other words, 0.05 is the bar I was talking about earlier.
But the problem is that 0.05 is a completely arbitrary number. We could say 0.04, or 0.06 — it wouldn’t really make a difference. Remember: It’s just a measure of the likelihood that the results were down to chance.
If I say that something is statistically significant, I’m saying that any differences between the groups in my study were probably not due to chance. There’s an effect there.
Let’s say I do my experiment and find a statistically significant difference between Group 1 and Group 3, with Group 1 losing 100 grams more weight over the six months of the experiment. Our p-value is amazingly low, at 0.000001.
Sounds like good news, right?
So we’ve passed the first test: We know the difference we are seeing is likely due to the drug we are giving Group 1. The statistical test says so!
But that’s not the only type of significance.
Statistical significance is about whether one thing caused another. Clinical significance is about whether we care. Does it matter if we can get some people to lose 100 grams of weight? Is that worth taking a drug for the next few months, years, or even for the rest of their lives?
So, if I say that my new drug is clinically significant, what I’m really saying is that I think the benefits it brings outweigh the side effects. I’m saying it changes your health enough that a doctor might be interested in prescribing it, and you might actually want to use it for your treatment.
In fact, clinical significance is the only one we care about.
One great example is the difference between men’s and women’s brains. A huge study recently found a statistical difference between the action of men’s brains compared with women’s in a few key regions. However, they also found that the similarities outweighed the differences, and that ultimately there was no clinical difference between men and women.
The point is that you can do a study and find statistical differences, but unless you know whether these differences are clinically significant, all you’re doing is playing with numbers. The researchers in this study couldn’t tell a man’s brain from a woman’s unless they knew beforehand which one was which, because the statistically significant differences in activity didn’t translate into something they could actually use in their work.
But when you read a news story about a scientific paper, you’ll never hear about the nuance of clinical significance. Remember all those scary stories about ibuprofen and heart attacks? There actually is a well-known statistically significant link between taking ibuprofen and having a heart attack. The only problem is that, for most of us, the increase in risk is very small — it’s statistically significant, but not clinically significant.
It goes the other way, too: There’s a statistically significant link between moderate drinking and not dying. The only problem is that the difference is small and likely explained by other factors, so there’s no reason to start swigging a glass of wine every day.
Almost every article you’ll ever read on science quotes statistics like they mean something.
All too often, they don’t.
It’s hard to know what matters when it comes to studies. Clinical significance is something that often requires a medical degree and years of training to properly understand.
But there are a few things you can look out for.
If the absolute effect size is small, unless it’s a really serious event (like death), chances are the clinical significance is limited. If the outcome that people are talking about is only tangentially related to actual health—for example, the amount of ice cream eaten—there’s a good chance the results don’t actually tell you much about your life.
If you’re ever really worried, go see a doctor. There’s a reason it takes almost a decade to be fully qualified.
Sometimes this stuff just isn’t that easy.