One of the first things people working in data science / machine learning have to grapple with is that our claims are predictive, not inferential. Instead of making careful, logical moves from one clear, proven claim to another, we mix together a set of features, build a model, and then present our results as a reasonably-probable forecast of the future.
Thoughtful people, however, know that “correlation does not equal causation.” Just because two things can be shown to occur in some sort of reliable sequence does not mean that Thing A necessarily caused Thing B. When I leave for work, I notice that the sun is often rising. But unlike the rooster, I can’t really take credit for that.
If “correlation does not equal causation” becomes the sum total of our causal reasoning, we would only accept claims issued from the oracular voice of a Randomized Controlled Trial. But that is an expensive, time-consuming, impractical, and potentially unethical bar to set for most questions. For instance, we now accept that tobacco smoking causes lung cancer, but it would have been fundamentally wrong to take a large set of people and randomly divide them into experimental arms where one would smoke a pack a day and the other would abstain.
But though correlation does not equal causation, it does not need to create some invisible shock collar excluding the vast majority of experimental claims humans would like to test.
One of the things I learned from Lucy McGowan at Data Day Texas is that there is a set of nine criteria by which we can make increasingly confident claims that the correlations we observe are causal. Called Hill’s Criteria, they describe nine properties of a causal relationship:
Strength. How strong is the Pearson correlation coefficient? .87, probably! .13, not so much.
Consistency. Does the correlation show similar strength across different samples? Is the lung cancer-smoking link similar among Asian-Americans and African-Americans?
Specificity. Is there a strongly unique source that accounts for the outcomes? Do people’s chances of developing incredibly rare forms of cancer increase with the distance of their home to the Chernobyl disaster?
Cartoon by Rodrigo.
Temporality. Does the effect change in reasonable relationship with time? At my son’s suggestion, I’m reading Comac McCarthy’s The Road, an elegiacal forerunner to zombie mania. My inner experience of unusual awe and despair is particularly strong just after putting down the book but fades as I immerse myself in everyday life.
Biological gradient. In medicine, does increasing the dose of a drug increase its effect?
Plausibility. Does the proposed relationship make any decent sense? For instance, this chart demonstrates a .95 correlation between the marriage rate in Kentucky and the number of drowning deaths from falling out of a fishing boats. Mere coincidence? I think so. (Unless readers in Kentucky can offer any insights about local customs involving marriage ceremonies that take place in fishing boats.)
Coherence. Can a coherent, logical argument be made for the proposed cause? If we accept that inhaling tar causes abnormal immune reactions in the lungs, and we accept that those immune reactions increase cancer risk, we can claim with greater confidence that inhaling the tar in cigarettes is likely to lead to cancer.
Analogy. Have we seen a similar effect from a similar exposure? If seeing The Avengersmade you more likely to speed on the drive home, is it reasonable that seeing Thormight have the same effect?
Experiment. Outside of the time and expense of an RCT, anyone can conduct other types of experiments which can provide the level of certainty they need to make a practical decision. A/B tests are a great example. Though you’ve got to design them thoughtfully, for most purposes, they can tell us which version of a website is likely to result in a higher conversion rate.
At the end of the day, “correlation is not causation” still stands true and strong. But I also learned that there are attributes of correlations which can make them more causal, and often causal enough to make practical decisions from.
Originally published February 18, 2018.