Correlation Vs. Causation
Why misinterpreting these two concepts has serious implications?
“Correlation does not imply causation”- we all might have come across this statement but what does that actually mean? Correlation and causation are terms that are mostly misunderstood and often used interchangeably. I still remember my Probability and Statistics professor discussing, how important it is to know about the differences between the two terms back in college. These two terms have always found a way into my life, be it in research, at work, and recently while taking some data science classes. Understanding both the statistical terms is very important not only to make conclusions but more importantly, making correct conclusions at the end.
To understand these concepts, let’s start with the basic definition.
What is Correlation?
Correlation is a term in statistics that refers to the degree of association between two random variables. It does not tell us why and how behind the relationship but it just says the relationship exists.
For example, If a random variable A and B tend to be observed at the same time, here we are implying a correlation between A and B. We are not implying a relationship between A and B or if A causes B and vice-versa. We are just saying that the random variable A is observed and is B, they move together (either positive or negative) or just show up at a given instance of time.
There are three types of correlations:
- A positive correlation is when a directly proportional relationship exists i.e as A increases, B increases and as A decreases, B decreases.
- A negative correlation is when a directly proportional relationship exists i.e. an increase in A leads to a decrease in B and vice versa.
- No correlation is when there exists no relationship between the two variables i.e. a change in A leads to no changes in B, or vice versa.
There seems to exists a positive correlation between US spending on science, space and, technology and Suicides by hanging, strangulation, and suffocation. Does that mean one causes the other? Absolutely not!
Remember, Correlation does not imply causation!
What is Causation?
Causation (also known as Causality) indicates that an event affects an outcome.
If we have two variables A and B, we are saying A causes B or vice versa. It is implying that they have a cause-and-effect relationship with one another.
For example, we all know “Smoking causes cancer”. Smoking and cancer have a cause-and-effect relationship with one another. The former is causing the latter to happen.
It is important to note that causation can only be determined from appropriately designed experiments. Experiments allow you to talk about cause and effect and without them, all you have is a correlation.
Correlation is not causation
(Causation can only be inferred, never exactly known)
It’s always easy to look at correlated data and jump to conclusions that A causes B. It’s because as human beings, our brains are biased towards cause and effect relationships. The correlations can sometimes be a coincidence or due to an effect of a third, unobserved variable. Sometimes the correlation can be spurious such as the link between sales and shaved heads.
Identifying causation requires experimentation and studies compared to finding a correlation because it needs to be backed by proper findings/evidence.
Understanding these concepts and the differences between Correlation and Causation can make a significant impact on how you draw your conclusion.
Think before you draw your conclusions!
I have had in the past made the mistake of jumping to conclusions after finding the correlations within my data. I highly recommend taking the time to analyze underlying factors and verifying your steps. And you always need well-designed experiments to talk about causes and effects. Experiments allow you to talk about cause and effect and without them, all you have is a correlation.