The Case for Correlational Studies

You can infer causation from correlation. Sometimes you must!

Lucia B.
The Startup
6 min readMay 26, 2020

--

Correlation does not equal causation. With these five magic words, you can dismiss any observational study as irrelevant in an argument. You could even link to the Spurious Correlations webpage, suggesting that your opponent’s claiming something as silly as “Per capita margarine consumption influences the divorce rate in Maine.”

It’s true that association between two variables can be the result of lurking variables — if variables A and B are strongly correlated, it could be because variable C causes both. It’s also true that correlation alone can’t demonstrate whether “A causes B” or “B causes A” is more likely. And as in the examples in the link above, even a striking correlation could be due to chance.

Taken together, these are the reasons that we need more stringent evidence to declare that A causes B. If you take a random sample of the population, randomly assign A to some of them, and see whether the people with A get B at a higher rate than the control group that’s otherwise similar in every relevant way, there’s only one difference you can pinpoint to explain the results. Such randomized controlled trials are known as the “gold standard” of causal evidence.

But when gold standard experiments aren’t feasible, evidence of correlations can still be informative. Here’s how.

Correlations are enough for predictions.

Say an atheist smugly points out that atheists are more intelligent, and he has strong statistical evidence on his side. He can’t use that evidence alone to argue that higher intelligence leads people to reject religion. All he can say is that if you take a typical atheist and a typical religious person, the atheist is probably more intelligent. There’s nothing incorrect about that statement!

With certain criteria, you can infer causation from correlation.

It seems obvious nowadays that smoking causes disease, but in the mid-20th century, smoking industry stakeholders argued that it wasn’t such a simple relationship. They claimed smokers got disease at high rates because people who chose to smoke tended to have anxious “Type A personalities,” which put them at a higher risk for disease.

This explanation sounded plausible — how could anyone find real cause-and-effect? Randomly assign people to smoke and see what happens to their health? If the claims were true, this would be wildly unethical.

So thanks to a top medical statistician who helped discover the smoking/cancer link, there’s an oft-cited set of useful criteria to infer causation — the Bradford-Hill criteria.

  1. Strength —This has two meanings here: how big the effect is (effect size), and how unlikely it would be due to chance (statistical significance). A weak effect can be causal, but the stronger it is, the more likely it is to be causal.
  2. Consistency — Does the A-predicts-B effect still hold across different populations, locations, and times? If so, some common factor C causing both A and B is less likely.
  3. Specificity — Perhaps it’s true that “Type A personalities” are more inclined both to smoking and disease. But if that’s the cause, why would smokers’ rates of lung cancer be outrageously high compared to their increased risks of other diseases and compared to other populations with increased disease risks? The more specific the association, the more likely it is causal.
  4. Temporality — This criterion is considered the most persuasive; it’s hard to claim that B could be causing A if B consistently appears after A. That’s why, when researchers found that youth hospitalized for infections had increased rates of subsequent eating disorders, most of which began shortly after the infections, it raised questions about how immune responses in the body might affect the brain.
  5. Dose-response relationship — Without a causal relationship, the presence of A might simply predict that B is present too, but if A causes B, higher levels of A could lead to predictably higher levels of B.
  6. Plausibility — Once researchers could explain the biological mechanism of cigarette smoke damaging lungs, it was less sensible to claim that smoking didn’t cause lung cancer. However, this is the criterion that Bradford-Hill was “convinced we cannot demand” — after all, “what is biologically plausible depends upon the biological knowledge of the day.” Just because researchers can’t explain how infections spark eating disorders (yet) doesn’t mean the relationship isn’t real; the other criteria are suggestive enough.
  7. Coherence — If A causing B would downright contradict current knowledge, that’s a sign the relationship is probably not causal. For example, some low-carbohydrate diet advocates claim that cutting out carbs causes weight loss due to lower insulin levels, with calories irrelevant, but if that were true, subjects of the “rice diet” would not have lost so much weight and improved their health results on such an insulin-spiking carbohydrate-heavy diet.
  8. Experiment — If you can manipulate a variable, what effect do you see? Often researchers will resort to animal studies when human studies would be unethical.
  9. Analogy — Imagine explaining a horse to a sentient extraterrestrial who’d never seen an earthly animal. Now imagine explaining a zebra to the alien after that — much easier, now that you can refer to the horse. (Thanks to Scott Adams for this analogy.) Similarly, if research has revealed a similar causal mechanism before, it can help another mechanism seem more plausible. For example, carbon nanotubes seem to hold incredible potential for inventions, but researchers are trying to determine whether their physical similarities to asbestos, which was also once hailed as a “wonder material,” may provoke the same terrible health effects.
  10. Reversibility — If removing A lowers B, with all else held constant, A was probably causing B. That’s the idea behind elimination diets; if all you do is cut out dairy and your acne goes away, your skin is probably dairy-sensitive. (Bradford-Hill didn’t include this criterion in the original list.)

Sometimes experiments aren’t the most suitable evidence for a relationship.

Since correlations have predictive power, sometimes the correlation itself outside of experimental settings is worth noting.

Say researchers hypothesize that a “growth mindset” (believing that one’s abilities are the result of effort) is more associated with successful outcomes than a “fixed mindset” (believing that one’s abilities are innate).

How should they test this—give some people an intervention to develop their growth mindset, then see whether they improve on some measure of achievement? How can an experimental intervention successfully give people a growth mindset? That in itself is worth its own study!

Instead, it could be helpful to look at the real world to see whether this effect exists. Some findings don’t look very promising. A large study of Czech university applicants, for example, found a slight negative relationship between students’ exam scores and whether they had a growth mindset, and intervention experiments in both college students and children have found that participants who started with a growth mindset started out doing worse.

According to Scott Alexander, this suggests that whatever “growth mindset → success” effect that exists is “swamped” by other effects. If so, what’s the use of claiming that growth mindset is a predictor of success? While correlation doesn’t always equal causation, it’s unlikely that you’d see causation without a correlation.

***

Clearly, correlational studies, even entire fields built on them, are not all junk science. They can provide salient evidence to strengthen or weaken a hypothesis when ideal controlled evidence isn’t available, and the Bradford-Hill criteria can help make the case. Evaluate such studies on their own terms — don’t just wave them away with “correlation does not equal causation”!

--

--

Lucia B.
The Startup

Baby names, dietetics, and learning sciences enthusiast. “21st century skills” skeptic. Future wife and mother. Carnegie Mellon class of 2022. Pronouns: I/me