Correlation and causation — Part 1

“Correlation does not mean causation.” Or so I keep hearing online. And yes, on the face of it, this is a true statement. But it does open the doors for many questions. After all, what IS causation? How do we know one event is the cause of another? Can we ever be sure?
What is causation?
According to the Oxford dictionary, causation is “the acting of causing something” and ‘cause’ is:
A person or thing that gives rise to an action, phenomenon, or condition.
If you get the definition of causation from whatis.com, which is the one Google offers in an Answer Box if you type in the question, you will see they go on to differentiate causation from correlation. That is interesting, given that was not the question, to begin with. They do offer, however, a more encompassing definition, saying that it is “is the capacity of one variable to influence another.” I like their definition, chiefly because it is more in line with the scientific view of causation.
Causation through life
You do not need to touch a hot surface more than once to understand the cause-effect relationship there. The stimulus is strong enough and clear enough for you to learn right away that this will hurt you.
In contrast, you might not have learned yet not to get too much sun exposure and to use sunscreen! The sunlight, I know you know it, causes sunburn. It also burns you! But the effect is not as painful, not as clear. You don’t learn as fast.
We can learn a few concepts from these real-life examples. For something to be the cause of another, it has to happen first. Additionally, it has to influence the result. If you do not go in the sun, you will not get sunburnt.
Applying everyday knowledge to science
Let us now apply the two concepts we learned to research studies. It may seem odd, but not understanding the timing right is one common problem in observational studies.
Getting the timing right.
How so? You get the timing wrong. The classic example is obesity and diet pop. Let’s go over it.
Imagine you ask 1,000 people about what kind of pop they drink because you want to understand the effect of diet pop on their BMI. Your data shows that a lot more obese people drink diet pop. Therefore, diet pop causes obesity! Right? Wrong.
I know: that is an easy one. The problem here is Reverse Causation. It is not that diet pop causes obesity, but obese people buy more diet pop because they are obese. But this is about timing. The conclusion that diet pop causes obesity would be wrong because the investigator got the timing wrong.
That happens more often than we care to measure. It is a major problem in all research that relies on peoples recollection. If you have clogged arteries, for example, you may have more incentive to remember all the egg yolks you ate, as you try yourself to find the reason for it.
That is not to say you could not test the hypothesis above. For one, you could try an experimental design, where people are randomized to drink diet or normal pop. You could also account for timing in an observational manner. Although you would need richer, more reliable data than a one-off survey.
Determining whether one variable influences the other
Now to the second part. To say one variable has a causal relationship to another, you have to show that it influences another. In other words, with all else being equal, the presence of the variable of interest changes the outcome or how often it happens. The all else being equal part is the pickle here.
It is often impossible to observe the same individual with and without the exposure. That would be the ideal way to keep ALL else equal. You could keep almost all else equal in the sunburn example. I mean, you use sunscreen you don’t get a sunburn, you don’t use in a week, you do. But how about a cancer treatment you don’t know works. You only die once.
That means we would like, in an ideal world, to compare the factual (what happened) with the counterfactual (what did not happen). We cannot do that. What is the next best thing?
You compare people that are very very similar, except for the intervention. That is also VERY hard, so you compare populations that are very very similar. The trouble here is that they need to be similar in the characteristics you can observe and measure and those you cannot, or did not even think about.
Randomization
How do you do that? You randomize people. You “toss a coin” every time you are to include someone in your study and include them according to the coin toss result. Although that does NOT assure perfect distribution of variables — since the groups could be different by chance alone, this is the best method we have to infer causation.
In the end, you have an average result in each group, and if the results are sufficiently different, you can say it is because of your intervention. Your established causation.
But you cannot always run a randomized trial.
There are several questions that do not lend themselves to a randomized trial. If you recall our two examples in the — Why clinical research? — post, you may have already concluded one was amenable to an experiment, and the other was not. If you have not read it yet, I recommend you do.
The question about the leeches would lend itself beautifully to randomization. The hypothesis that cholera spread through contaminated water would not.
John Snow took advantage of what was a Natural Experiment, which is not randomized, strictly speaking. But could we ethically randomized people to drink dirty water? I do not believe so.
Another example is smoking and lung cancer. That is closer to our time, which might make it easier. There was no way for investigators to randomize people to a smoking or a non-smoking group, once they were already convinced that smoking caused lung cancer.
Under those circumstances, the researcher HAD to rely on observed data to reach their conclusions. Since we all know for a fact that smoking cigarettes can cause lung cancer, there must be a way to determine that using observational data!
And indeed, there is. You can use a set of criteria to try to understand whether the relationship you found is really a cause-effect relationship. We already discussed one important criterion, time. The others we will explore in Part 2.
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — -
Did you enjoy this content? Please, support us by subscribing to our email list, sharing this post on social media, and joining our Facebook page. We would also love to hear from you. So, please, leave a comment below!
