Part 2: Simplifying causal inference to connect stakeholders and data scientists

Basic tools to measure causality and assess impact

Robson Tigre
7 min readSep 4, 2024
Source of figures here

This is the second in a series of posts aimed at bridging the communication gap between business professionals and data scientists on causal inference.

  • In the first post, we discussed fundamental concepts of causal inference, such as causal effects, counterfactuals, and some types of bias. Before proceeding, I recommend reading part 1 if you haven’t yet.
  • Now, let’s explore some of the key tools that allow us to measure cause-and-effect relationships and assess impact in different contexts.
  • Again, the sections follow a structure of “technical explanation,” “note,” and “intuitive explanation” — easy to understand and illustrated with real-world examples to help non-specialists grasp the concepts.

1. Regression analysis (used in all other techniques discussed in this post)

Technical explanation: Regression is a technique used to estimate the relationship between one or more independent variables (such as an advertisement or promotion) and a dependent variable (like sales or website traffic). The goal is to quantify how each factor influences the outcome, while holding all other variables "constant".

A common example is linear regression, which aims to fit a straight line between data points, in such a way that the sum of the squared distances between this line and the points is minimized (example below).

Source of figure here

Note: Regression alone does not guarantee causality; it only measures associations. To ensure the observed relationship is causal, we need either an experimental design or methods that help isolate the influence of unobserved variables (quasi-experiments). The interpretation of the regression coefficients will be misleading if there is omitted variable bias (see this concept in part 1).

Intuitive explanation: Imagine your team launches several marketing campaigns by email, and you want to understand if they actually increased sales. By using regression analysis, you can measure the relationship between sales and the email open rates while controlling for variables like seasonality or other contemporaneous promotions. However, if you don’t consider other factors (such as customer loyalty or previous engagement), you risk getting biased results about the impact of these campaigns.

2. Randomized controlled trials (RCTs)

Technical explanation: Randomized controlled trials, also known as RCTs, randomized experiments, or A/B tests, are the most robust way to determine causality. In RCTs, participants are randomly divided into two or more groups. One group receives the treatment — for example, a new app feature, while the other group does not.

Randomization ensures that, on average, the characteristics of the groups are similar, so any differences observed in outcomes can be attributed to the treatment. This eliminates the influence of external factors that could bias the results.

Note: RCTs can be expensive and difficult to implement, especially in contexts where changes made to one group could cause dissatisfaction or lead a company to forgo potential revenue by maintaining a control group. Moreover, we must keep in mind that experiments can involve imperfect compliance (see part 1), making impact estimation less straightforward.

Intuitive explanation: Suppose a digital platform wants to test a new “one-click purchase” feature. To verify if this feature increases sales, we can randomly select half of the users to have access to this feature, while the other half continues using the traditional checkout. By doing this, you ensure that any difference in purchase behavior between the two groups is attributed to the “one-click purchase” feature, since, apart from that, the groups are essentially the same.

3. Regression discontinuity design (the sharp case)

Technical explanation: Regression discontinuity design (RDD) is used when there is a cutoff point that determines who receives an intervention or treatment. The idea is that individuals around this cutoff are very similar in all aspects except for the treatment. By comparing the outcomes of individuals just above and just below the cutoff, we can estimate the localized causal effect around the cutoff (LATE).

Source of figure here

Note: The classic sharp RDD only works well when there is a clear and strictly respected cutoff point that designates the treatment, and when the pre-treatment behavior of individuals around the cutoff is similar. The results are limited to the surroundings of this cutoff point, so generalization may be difficult. There is also fuzzy RDD, which works as an application of instrumental variables.

Intuitive explanation: Imagine your digital platform implements -without prior notice- a loyalty program, where customers who spent R$ 500 or more in the previous month get free shipping this month. We can compare customers who spent R$ 501 with those who spent R$ 499. On average, they are nearly identical, except that one group received the free shipping benefit. Comparing the behavior of these two groups in the future may reveal the impact of free shipping on future purchases.

4. Difference-in-Differences (DiD)

Technical explanation: The difference-in-differences method compares changes over time between a group that receives treatment and another that does not. The core assumption is that, in the absence of the intervention, both groups would follow a parallel trajectory over time. The treatment effect is then measured as the difference in changes in outcomes between the two groups.

Note: The validity of the classic DiD is tied to the assumption that both groups followed and would have followed parallel trajectories in the absence of the treatment. If (i) this isn’t true, even with the inclusion of covariates in the model, or if (ii) any contemporaneous factor influences one of the groups asymmetrically, the results may be biased.

Additionally, attention must be paid to spillover effects (when the treatment indirectly affects the control group) and anticipatory effects, where one group may change its behavior in anticipation of the intervention.

Intuitive explanation: Imagine your company launches a new tool to increase employee productivity. The software is implemented in one specific team, while another similar team continues working without this technology. To measure the impact of the new tool, you compare the productivity of the two teams over time, especially before and after the software’s implementation.

However, for the analysis to be valid, it is essential that, without the tool, both teams would have continued to follow a parallel productivity trend (obviously, this hypothesis is not directly testable). Additionally, if one team received extra training during the same period or if there were changes in working conditions, these factors could bias the results.

5. Instrumental variables (applied to experiments with imperfect compliance)

Technical explanation: In experiments with imperfect compliance (see part 1), some participants assigned to the treatment do not receive it (or receive it but do not use it), reducing the effectiveness of randomization.

The instrumental variable technique solves this problem by using a third variable, called an “instrument,” which influences the probability of “using” the treatment but does not directly affect the final outcome. The two-stage least squares (2SLS) method uses this instrument to calculate the real effect of the treatment, correcting distortions caused by imperfect compliance.

Note: The biggest challenge here is finding a “good instrument,” which influences the likelihood of individuals consuming the treatment but does not directly affect the final outcome, except through the treatment. Good instruments are rare, and identifying valid instruments can be very difficult outside of an experimental context — which is why here we focus on IV applied to experiments. In experiments, the randomization of the treatment serves as an instrument.

Intuitive explanation: Imagine a marketplace randomly distributes discount coupons. Many customers who receive the coupon do not use it (imperfect compliance). However, the randomness of being in the treatment group and thus receiving the coupon can be used as an instrumental variable to estimate the real impact of coupon use on sales. This corrects the problem of imperfect compliance and allows you to evaluate the true effect of the discount on sales for those who used the coupon.

Now that you are familiar with these tools, we can dive deeper into strategies to measure impact and statistical inference (p-values, confidence intervals, test power, etc.). In the next post, we will expand this toolkit.

Thank you for reading. Follow me for the continuation of this series :)

If you want to see more, follow me here and on LinkedIn, where I share posts on causal inference and career development.

Btw, if you notice any mistakes or have suggestions, I’d love to hear from you. Your feedback and ideas for new topics are always welcome :)

--

--

Robson Tigre
Robson Tigre

Written by Robson Tigre

Research Economist and Data Scientist at Mercado Livre. Causal inference expert. Cross-areas: marketing, pricing, retention, and accounts