Causal Inference (Part 1) — A must read for practitioners

Confounding Variables, Randomized Control Trials

Sravan Vadigepalli
5 min readNov 20, 2022

We heard the phrase, ‘Correlation doesn’t imply Causation’, then what is Causation. In this series, I attempt to provide a practitioner’s approach to measure causality.

Influence from Mere Association Tendency

The psychology of human misjudgement by Charlie Munger was a speech delivered in the mid 1990s. Mr. Munger touches on 25 human biases, and Influence from Mere Association Tendency is one among them. It boils down to the innate biases that humans have where we want something to be right, even though deep down we know it doesn’t truly make sense. We essentially correlate things, and we don’t always look for causality. We act as if there is a cause and effect, but a good number of times it’s a pure play association.

Why do we conflate the two?

Any Association is a mixture of Correlation and Causation. Humans are designed to make most everyday decisions in a split of a second, a good chunk of them are tied to cause and effect. An example: You had a cold shower in the morning, and a headache later in the day, our simple heuristics would associate cause and effect relationship between cold shower and headache. But, is that true? Yes, they are correlated, but is a cold shower the cause for headache? maybe, maybe not unless we weed out the confounding variables.

Heuristics Approach

As a runner, the most common thing I hear non-runners say is that running causes injuries. There is no punitive intent by non-runners, instead it’s coming from the urge to explain an outcome by correlating past events. In general, We make judgments or decide probabilities of outcomes with intuition or gut feeling rather than doing a scientific experimentation, called Heuristics.

Our heuristics help with correlations, but to unpack spurious associations beyond a simple correlation, we need to explain other variables that potentially impact an outcome. So, essentially a third variable that has an effect on running and injuries. This additional variable is called a confounding variable.

What are confounding variables?

Confounding variable is an extra factor which distorts or masks the causal connection between the independent (Running) and dependent variable (Injuries). In this simplified example, Mileage is a confounding variable.

So, if we have two groups of runners and assume everything stays the same between the two — age, physical health, aerobic capacity, yada yada.. The only difference between the two groups is ‘the number of miles they run per week’. So, group A runners run 20 miles per week, and group B runners run 50 miles per week. In this example, Mileage becomes the common cause we need to solve to identify causality. So, running causing injuries is an association that has a mixture of correlation and causation. If we simply say, running causes injuries, that is correlation. Since, we have another variable called Mileage that has an effect on your running causing injuries. Solving for this third variable, helps us to define causality.

In the real world, we don’t always have just one confounding variable. In the above example, we have considered Mileage as the only confounding variable, but in reality we can have other factors like Running shoes, Intensity of running to have a cause and effect on injuries.

Cause and Effect — Traditional Approach

The most common method to explain cause and effect is through experimentation, usually A/B Testing.

Visualization from https://vwo.com/

In the above hypothetical example, a retailer wanted to measure whether Orange Header (Variation A) performs better than Green (Variation B), considering all else equal between the two variations. We split the incoming traffic across both A and B, so half of the traffic receives variant A and the other half receives variant B. In actuality, you will see more conservative splits — instead of directing 50% of traffic to a new variation, the best practice is to do it in increments until the experiment reaches statistical significance.

In this example, it is clear that Orange header (variant A) is performing significantly better than Variant B. This is a clear indication for business and product teams to implement Variation A.

Need for Causal Inference

You might be thinking if A/B testing framework works great, why do we need Causal Inference? A/B testing works great where you can have the ability to randomize test and control groups. However, you don’t always get the benefit of having a hold-out group.

As an example, during COVID, a lot of retailers have launched better omni-channel experiences, where customers can shop online or in-store providing a seamless experience. The most common experience is Curbside Pickup, where a customer can order online and pick it up in the comfort of their car. As a business executive, I’d like to understand how Curbside pickup is resulting in more sales and engagement. Essentially, business would like to understand the incremental value of this experience.

In a traditional A/B setting, you do that by providing ‘Curbside Pickup’ for a segment of your customers (Variation A), and holding others from using ‘Curbside Pickup’ (Variation B). There are two problems with this approach

  1. You don’t want to stop people from using your newer initiative just for the purposes of experimentation ( you will get an eye roll from your business/product teams)
  2. You still have the option of controlling which stores receive ‘Curbside Pickup’ — but again, if a business intends to bring this incentive right away to all the stores from a customer convenience perspective (with COVID in mind), waiting for the experimentation results is not viable. You guessed it right, you will receive another eye roll.

These are the situations where you need Causal Inference. Without impacting product roadmap or initiatives, Causal Inference still provides the ability to measure the effectiveness of an intervention. Unlike A/B testing, Causal Inference can be derived after the fact. It brings an Incrementality approach to measure an intervention.

In Part 2, I’ll explore in depth on Association, Confounding Variables, and how to measure Causality. Thanks for reading , clap if you found this article meaningful.

References:

https://sketchplanations.com/correlation-is-not-causation

https://www.statsmedic.com/post/correlation-does-not-mean-causation

https://www.bradyneal.com/causal-inference-course

https://www.sloww.co/psychology-human-misjudgment-charlie-munger/

--

--