The Difference Between Correlation and Causation
Correlation and causation are two of the most important concepts to understand if you want to create growth.
Ben Yoskovitz, Founding Partner at HighlineBeta, explains the difference between correlation and causation by stating “correlation helps you predict the future, because it gives you an indication of what’s going to happen. Causality lets you change the future.”
Knowing the difference between the two goes a long way in ensuring that your business decisions are based on hard facts and measurable variables.
Making decisions based on assumptions means you run the risk of jeopardizing the success that you’re working hard for. It’s not intentional but before you make your next decision, consider whether your actions are based on assumptions or proven facts.
What pizza and the germ theory of disease have in common
A correlation is a relationship that you observe between two variables that appear to be related.
There’s a classic New York City observation that the price of a ride on the subway tends to fall and rise with the price of a slice of pizza. That’s a correlation. It’s not, however, causation — the prices that pizza vendors choose to set for their pizza do not actually influence the pricing that the MTA sets for the subway.
Until the late 19th century, it was believed by scientists and laypeople alike that bad odors caused disease. The sick and dying tended to smell unpleasant so the two phenomena were correlated.
This led public health officials to improve hygienic conditions such as remove standing water, so this understanding wasn’t entirely without merit. However, it was only in 1880 that the germ theory of disease became accepted. With this, it became clear that while bad smells and disease often appeared together, both were caused by a third, hitherto unknown variable — the microscopic organisms we know as germs.
Correlations are often mistaken for causation because common sense seems to dictate that one caused the other. After all, bad smells and disease are both unpleasant, and always seem to appear at the same time and in the same places. But you can have a foul odor without a disease. Diseases can strike even in places where standing water isn’t present — like hospitals where the surgeons aren’t washing their hands.
To prove causation, you need to find a direct relationship between variables. You need to show that one relies on the other, not just that the two appear to move in concert.
When it comes to your business, it is imperative that you make the distinction between what actions are related and what caused them to happen. Don’t make a mistake that will cost you time and money because you’ve based your decisions on unproven assumptions.
How correlation gets mistaken for causation
Picture this: you’ve just launched a new version of your app.
You’ve asked your team to build all kinds of new social features out of the belief that what your app really needs are better ways for people to connect with their friends.
Thirty days into the new app being out, you check your retention numbers. You categorize your users randomly into two groups — those who “joined communities” and those who didn’t — and find a stunning phenomenon.
Users who joined at least one community are being retained at a rate far greater than the average user.
Nearly 90% of those who joined communities are still around on Day 1 compared to 50% of those who didn’t. By Day 7, you see 60% retention in community-joiners and about 18% retention for those who were not. This seems like a massive coup.
But hold on. You don’t actually know if joining communities causes better retention. All you know is that the two are correlated.
There could be a multitude of things behind the increased retention exhibited by users who join one or more communities. You have no idea what other factors are at play, what other behaviors those users took part in besides joining a community. To find out what’s actually behind the increased retention exhibited by this group of users , you need to examine more variables.
Causation reaffirms certainty
Moz founder Rand Fishkin frames causation this way, “correlation can help you predict what will happen. But finding the ’cause’ of something means you can change it.”
Unlike correlations, causal relationships don’t happen by accident. Once you lay out the variables, you can control and change them to meet your needs.
Once you find a correlation, you can prove causation by running experiments where you “control the other variables and measure the difference.”
Running experiments to determine causation
A/B testing is one of the best ways to get from correlation to causation. Look at each of your variables, change one and see what happens. If this changes the outcome, you’ve found the variable that makes the difference.
Andrew Chen, who works on Uber’s growth team, puts it this way, “After you’ve found the model what works for you, then the next step is to try and A/B test it. Do something that prioritizes the input variable and increases it, possibly at the expense of something else.” He continues, “see if those users are more successful as a result. If you see a big difference in your success metric, then you’re on to something. If not, then maybe it’s not a very good model.”
When it comes to making a case that joining communities leads to higher retention rates, you have to eliminate all other variables that could influence the outcome. Users could have taken a path besides joining communities that affected retention.
To test whether there’s causation, you’ll have to find a direct link between users joining communities and using your app long-term.
Start with your onboarding flow. For the next 1,000 users who sign up, split them into two groups. Half will be forced to join communities when they first sign up and the other half won’t be.
Run the experiment for 30 days and then compare retention rates between the two groups.
If the group that was forced to join communities has a higher retention rate, then you have the evidence you need to confirm there’s a causal relationship between joining communities and retention.
You won’t be certain of a relationship until you run these types of experiments.
Your business depends on you understanding the difference
We are always looking for patterns around us, so our default aim is to be able to explain what we see. However, unless causation can be clearly identified, it should be assume that we’re seeing correlation.
Events that seem to connect based on common sense can’t be seen as causal unless there’s a clear and direct connection. While causation and correlation can exist at the same time, correlation doesn’t mean causation.
The takeaway here is that you must look at the conditions and opportunities facing your business from all angles before making decisions that could affect your long-term gains.
Analytics is about getting your team the data insights it needs to build better products and make the right decisions…amplitude.com
Originally published at amplitude.com on January 19, 2017.