# The Treacherous Road Between Correlation and Causation

--

By Dr. Gary L. Deel
Faculty Director, Wallace E. Boston School of Business, American Public University

What if I told you that every time event “A” occurs, event “B” follows? Would you be tempted to think that A causes B?

If so, you’d be forgiven for such an assumption. After all, this way of thinking makes good sense temporally. A cause must obviously precede an effect, so if “A” always happens before “B,” then it’s natural to expect that “A” causes “B.”

But again, this is just an assumption. And it is often an erroneous one because without knowing the details of “A” and “B,” it is impossible to know whether there is, in fact, a causal relationship between the two events. At American Public University, classes such as SOCI332 Statistics for Social Science teach students how to avoid these assumption pitfalls.

For example, suppose that “A” and “B” have to do with baseball games and baseball fans. Imagine that “A” is a baseball fan turning on his TV to ESPN, and “B” is the New York Yankees playing a baseball game. The fan turns his TV to ESPN immediately before the game because he wants to watch it. And of course, the game follows immediately thereafter.

But when we look back to the originally proposed assumption, that “A” must cause “B” because A precedes B, this way of thinking obviously seems absurd in the context of a baseball game on TV. It goes without saying that a fan turning his TV to ESPN does not automatically cause the baseball game to be played.

Rather, the opposite is actually true. The game about to be played is actually what causes the fan to turn on his TV. So “B” causes “A” in spite of temporality and coincidence.

# Correlation and Causation Are Quite Different

Correlation is not causation. We might observe that two or more things are correlated; that is to say, they are frequently coincident or they occur at similar intervals to each other.

But this does not mean that a causal relationship exists between them. Sometimes two events are highly correlated, but have no direct causal link.

For example, let’s go back to our baseball example and consider two more events. Suppose “X” is hot dog vendors in Yankee Stadium selling a lot of hot dogs, and “Y” is the Goodyear blimp is spotted flying over New York City. And suppose that “X” and “Y” are shown statistically to be highly correlated, that is, they very often happen together.

Now, would any serious person be tempted to think that hot dog sales affect blimp flight patterns or vice versa? Of course not, so there is no causal link between these two events.

But they do share a common cause that explains their correlation: the baseball game schedule. So every phenomenon has a cause, but correlation and causation are not the same thing.

# Sometimes There Is No Causal Link or Common Causal Origin

Taking this line of thought one step further, sometimes there is no causal link between events and no common causal origin whatsoever. In other words, sometimes things just occur coincidentally, but they are in no way causally related, no matter how far back through the chains of independent causation you look.

Consider this webpage highlighting a book of more than a dozen “spurious correlations” or correlations without any conceivable causal link by Tyler Vigen. Vigen used best-fit statistical modeling tools to identify variables that appear to be following one another in lockstep, despite having absolutely nothing in common.

For example, the second graph on the webpage illustrates a 10-year trend line for the number of people who drown annually by falling into a swimming pool. This is contrasted with a trend line over the same period for the number of films Nicholas Cage appeared in.

Now, obviously any reasonable person can quickly conclude that the two events cannot possibly have any causal link to each other. And yet, there they are, side by side, with striking similarity.

But notice that Vigen had to get pretty creative with the scaling for such variables just to give the illusion of such likenesses. For example, you’ll see that on the pool drownings versus Nicholas Cage movie graph, the scale for drownings is between 80 and 140, while the scale for Nicholas Cage movies is between 0 and 6.

So upon closer examination, we see that this is a cherry-picking of scales to achieve the desired effect. If both scales originated at 0, these two trend lines would scarcely bear the same resemblance.

But it seems we are nonetheless compelled to want to infer causality in the world around us, whether or not such inferences are appropriate. When we perform a good luck ritual such as rubbing a lucky rabbit’s foot and then subsequently experience success or prosperity, we assume that our actions had some effect on the workings of the universe, despite clear scientific evidence to the contrary.

# It Is Far Easier to Make Sense of the World When We Assume Causality

But it is far easier to make sense of the world when we assume such causality. The idea that things could be inexplicably or even counterintuitively correlated is unsettling, because it leaves us without the comfort of having confidence in our own sense of awareness and foresight. After all, if people are dying in swimming pools at more or less the same rate that Nicholas Cage is making movies, with no explanation whatsoever, then how are we supposed to make sense of anything?

But we shouldn’t throw our hands up in exasperation. There are tools of statistical analysis that allow us to parse correlation from caution.

For example, in SOCI332 Statistics for Social Science, our university’s students learn linear regression modeling. That is an advanced method that can, under certain circumstances, identify whether a causal (and not just coincidental) relationship exists between two or more variables.

These tools are critical to our efforts to avoid self-deception. We must take the time to ensure that we are not fooling ourselves when we seek to truly understand the world.