Correlation & Causation

Correlation does not prove Causation!

Mala Deep
Mala Deep
Mar 27 · 5 min read
Image for post
Image for post
The “chicken or egg” paradox was first proposed by philosophers in Ancient Greece to describe the problem of determining cause-and-effect. Photo by Katherine Chase on Unsplash

“Correlation does not prove causation”: This was the statement I came across during my Udacity-Bertelsmann Technology Scholarship on Data Track Course- 2019. I was awestruck by this line. I was doing EDA, and based on correlation; I summed up my result(causation accepted). [Yes, I was wrong!]

That very line from Bertelsmann Data Track course made me realize that I was steering towards wrong analysis; thus, I started to dig deeper and try to understand the thin line difference between Correlation & Causation.

Understanding the phrase “Correlation does not prove causation” and underpinning the concept on your next data science project will make you double confident.

What’s Inside:

  • Understanding the correlation.
  • Calculating correlation.
  • Understanding the causation.
  • Establishing causation.
  • The key differences between correlation and causation

Before jumping into the process of being double confident , let’s understand the underlying meaning of each concept and move forward.

What is the Correlation?

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables Or correlation is simply a relationship between anything. The general and most prefer objective of the analysis is to identify the extent to which one variable relates to another variable, i.e., to see how to target variable is dependent on an independent variable.

Image for post
Image for post
Correlation: When two or more things appear to be real. Source:eufic.org

A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the difference in the values of the other variable.

How is the correlation measured?

Pearson r correlation: Pearson r correlation is the most widely used correlation statistic to measure the degree of the relationship between linearly related variables. There are three possible results of a correlational study:

  • Positive correlation: One variable increases; the other variable increases.
  • Negative correlation: One variable increases; the other variable decreases.
  • No correlation: There is no apparent relationship between the two variables.
Image for post
Image for post
Source: https://www.simplypsychology.org/correlation.html
Image for post
Image for post
Pearson correlation matrix which shows “How each column are corelated to each other”.

If you are familiar with pandas then Pandas dataframe.corr() is used to find the pairwise correlation of all columns in a dataframe and to make the result obtained from dataframe.corr() look beautiful and more comfortable to interpret, you can import Seaborn library, and plot Heatmap also called Pearson coefficient of correlation. To know more about it, read my previous post.

The correlation coefficient should not be used to say anything about the cause and effect relationship. By examining the value of ‘r’, we may conclude that two variables are related, but that ‘r’ value does not tell us if one variable was the cause of the change in the other.

So, here comes the need of understanding Causation.

What is Causation?

Also known as causality or cause and effect, indicates that one event is the result of the occurrence of the other event, i.e., there is a causal relationship between the two games.

It tries to answer the question: does one variable impact the other?

Image for post
Image for post
Causation: When one thing causes another to happen. Source:eufic.org

How can causation be established?

When data shows a correlation, then we can say that there is necessarily an underlying causal relationship. Still, we cannot confidently say that there are a cause and effect relation. For establishing causation, we can approach two further processes after correlating.

  • Controlled study
  • Non-spuriousness

Controlled study

The use of a controlled study is the most effective way of establishing causality between variables. In a controlled study, the data is split into two, i.e., treatment(which would be the independent variable) and interest (the dependent variable) with both groups being comparable in almost every way. After that, these two groups receive different treatments, and the outcomes of each group are assessed.

How to perform controlled study? Find more on below article.

Non-spuriousness

The spurious or false relationship exists when what appears to be an association between the two variables is caused by a third extraneous variable, i.e., A and B are correlated, but they’re created by C.

So, in non-spuriousness, it requires that alternative explanations for the observed relationship between two variables should be ruled out, i.e., the analysts should take greater challenges in ruling out spurious relationships and establish the non-spuriousness among the variables.

Image for post
Image for post
Examples of Spurious Relationships

Find more about Spuriousness for causation in the below article.

After understanding the underpinning points about correlation and causation, we can move to see what’s the difference.

So, What’s the difference between correlation and causation?

Correlation and causation are often confused because the human mind likes to find patterns even when they do not exist. Also, if there is a stable association between the two variables, we cannot assume that one causes the other. Even if there is a strong correlation, we cannot jump directly to causation without doing at least a randomized controlled experience.

E.g., smoking is correlated with alcoholism, but it does not cause alcoholism.

This example shows that there is a correlation, but it is not causation.

Image for post
Image for post
Correlation doesn’t alwasy imply causation. Source:eufic.org

In practice, however, it remains difficult to establish causation, compared with establishing correlation.

Conclusion

Understanding causation is a difficult problem. Looking at the correlation and jumping into making bold claims without checking causation is a totally wrong approach, and unless and until causation can be clearly identified, it should be assumed that we are only seeing the correlation and still causation is lacking. The more confident you become at identifying true correlations and causation within your dataset, the smarter you be in data science domain.

If you have any questions or thoughts on the article, feel free to reach out in the comments below, or through the linkedin, or through my website.

Stay tuned for next Data Science related Post.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Data Science Blogathon: Win Lucrative Prizes!

By Analytics Vidhya

Launching the Second Data Science Blogathon – An Unmissable Chance to Write and Win Prizesprizes worth INR 30,000+! Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Mala Deep

Written by

Mala Deep

Data Science | Data Visualization | Community Work Focused | Philekoos | https://www.linkedin.com/in/maladeep/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Mala Deep

Written by

Mala Deep

Data Science | Data Visualization | Community Work Focused | Philekoos | https://www.linkedin.com/in/maladeep/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store