Correlation to causation is paved with human intuition
Correlation vs. Causation
Anyone who has worked with data has been presented with the question of whether the results are due to causation or simply correlation. This is mainly based on the famous statement that “correlation does not imply causation”.
According to Wikipedia:
In statistics, the phrase “correlation does not imply causation” refers to the inability to legitimately deduce a cause-and-effect relationship between two variables solely on the basis of an observed association or correlation between them
We will seek to explain this through examples but first, let’s give some basic definitions:
a relation existing between phenomena or things or between mathematical or statistical variables which tend to vary, be associated, or occur together in a way not expected on the basis of chance alone
the act or agency which produces an effect
So let’s simplify that. Correlation is the phenomenon of a relationship where two variables perform similarly to one another. Causation is that one of these variables performs similarly because of the other.
Proof by Example
Let’s look at the variables of height and weight. We might find that a person’s height and weight have a direct relationship with the waist size and length of their pants (correlation). One could even infer that a person’s pant size selection is a result of their height and weight (causation).
However, let’s say there we have a data set that takes into account the person’s height and weight and we are looking at the total number of miles an individual drivers in a year. We may find a correlation between height/weight and the number of miles driven but we cannot logically say that larger people drive more simply because they are bigger. Simply put, correlation does not imply causation.
To me, the best way to understand this is to visualize it. Thankfully, some have already gone through the trouble to show the difference through some comical examples. Below are a series of charts showing the “unlikelihood” that correlation leads to causation:
So how do we get from correlation to causation? Well, what is common amongst the graphs above and the conclusion that correlation does not imply causation? The answer is human intuition. A computer looks at the three graphs above and find’s some interesting patterns. A human looks at it and finds pure coincidence.
As machine learning becomes more and more popular many people ask if that is the solution to finding causation in data. After all, machine learning by definition has the machine “learn” as a human does.
Machine Learning has the capability of getting closer to causation in that it can look at multi-dimensional patterns and get deeper into the relationship amongst the data. However, machine learning insights still do not fully imply causation. The answer is found through a balance of human and tech via what we call Approachable AI. Approachable AI involves making machine learning and predictive analytics more useable, explainable, and accessible.
The goal, and the path from correlation to causation, is to combined the advanced capabilities of machine learning to find patterns deemed as relevant, with the intuition of a human to best understand the relevancy.
To fully accomplish this, we need the users of the data, those who know it best, to have access to the advanced capabilities of machine learning. Too often we see the complexities of AI forcing the data users to hand things off to a data scientist. This leads to a disconnect as the humans directly interacting with the data (the data scientists) are often no longer the ones with the domain expertise possessing the intuition to find causation in the results.
Rather than focus on training the data scientist to have more domain expertise, we believe the most effective route from correlation to causation is actually to simplify the technology in order to put it in the hands of the domain experts. Only then can the advanced patterns be properly paired with human intuition to extract true causation.
Through the Elipsa Analytics Platform, we automate the data science. Allowing the data to remain in the hands of the domain experts for them to directly apply their knowledge and intuition. This combination of advanced technology and domain expertise is the true path from correlation to causation.