Exploring Causal Inference

Published in

AI Skunks

10 min readApr 28, 2023

Introduction

Causal inference lies at the heart of data-driven decision-making and is crucial for many applications in machine learning, such as recommendation systems, medical treatment planning, and policy evaluation. While traditional machine learning algorithms excel at identifying patterns and correlations within data, they often fall short when it comes to establishing causal relationships. Understanding causality, as opposed to mere correlation, enables us to make informed predictions about the consequences of interventions and helps avoid drawing erroneous conclusions that can have significant real-world implications.

As machine learning models are increasingly being used to inform high-stakes decisions, the importance of incorporating causality into these models cannot be overstated. By developing a better understanding of causal inference, we can ensure that the insights derived from our models are not only accurate but also meaningful and actionable. This, in turn, can lead to more reliable and robust decision-making processes in various domains, such as healthcare, finance, marketing, and public policy.

In this article, we will explore the critical concept of causal inference in machine learning, diving into the distinction between causality and correlation and discussing various causal inference techniques. By the end, you’ll be better equipped to build robust models that capture causal relationships, enabling more informed decision-making across a range of domains.

Understanding Causality and Causal Inference in Machine Learning

1. Causality and its role in machine learning and data-driven decision-making:

Causality refers to the relationship between two variables where a change in one variable directly results in a change in the other. In other words, causality implies that an intervention in one variable can cause a predictable effect in another variable. Establishing causal relationships is essential in machine learning and data-driven decision-making, as it allows us to make reliable predictions about the consequences of actions and interventions. While traditional machine learning techniques are primarily focused on identifying patterns and correlations, understanding causality enables us to build models that can provide meaningful and actionable insights in various domains.

2. The difference between causality and correlation:

Correlation is a statistical measure that describes the degree to which two variables change together. While correlation can indicate a potential causal relationship, it does not imply causation. In other words, just because two variables are correlated does not mean that one causes the other. It is crucial to recognize this distinction, as confusing correlation with causation can lead to erroneous conclusions and misguided actions. Common pitfalls and misconceptions include assuming causality based on correlation alone or overlooking the possibility of hidden confounders that influence both variables.

3. Causal inference techniques:

Causal inference is the process of determining the causal effect of a treatment or intervention on an outcome of interest. Various techniques have been developed to estimate causal relationships, each with its own set of assumptions and limitations. Some of the most widely-used methods include:

Randomized Controlled Trials (RCTs): RCTs are considered the gold standard in causal inference. In an RCT, subjects are randomly assigned to either a treatment or control group, ensuring that any differences in outcomes can be attributed to the treatment rather than confounding factors. However, RCTs can be expensive, time-consuming, and sometimes unethical or infeasible.
Propensity Score Matching (PSM): PSM is a statistical technique that aims to estimate the causal effect by comparing treated and control subjects with similar propensity scores, which are estimated probabilities of receiving treatment based on observed characteristics. This method helps to control for confounding variables, although it relies on the assumption that all relevant confounders have been measured.
Instrumental Variables (IV): IV is an approach used when treatment assignment is not random and there may be unmeasured confounders. An instrumental variable is a variable that is correlated with the treatment but not with the outcome or any confounders. By using the variation in the instrumental variable, we can estimate the causal effect of the treatment on the outcome. However, finding a valid instrumental variable can be challenging and requires strong assumptions.

Dataset Introduction:

The UCI Bank Marketing dataset offers valuable insights into a real-world bank’s direct marketing campaign, presenting an ideal opportunity for causal inference. Originally designed for a classification task to predict if a client will open a term deposit account, the dataset contains several variables that can be considered interventions or treatments for increasing the number of clients making term deposits.

The dataset features information on client credit characteristics, contact mode, date and duration of the last contact during the campaign, and variables related to previous marketing campaigns. Additionally, it includes economic indicators such as the employment rate and consumer price index.

In the notebook here, we analyzed the Bank Marketing Dataset to evaluate the impact of marketing decisions, focusing on understanding the causal relationships between various interventions and the likelihood of clients opening term deposit accounts.

Understanding Treatment Variables and Potential Confounders

In causal inference, treatment variables and potential confounders play a crucial role in estimating the causal effect of interventions on the outcome of interest. In this section, we will define these terms and discuss their importance in the context of causal analysis.

1. Treatment Variables:

Treatment variables, also known as interventions, are factors that decision-makers can manipulate to influence the outcome. They are the primary focus of causal analysis, as we aim to understand their causal effect on the outcome. For example, in a marketing campaign, treatment variables might include the mode of communication (e.g., email or phone), the timing of the campaign, or the promotional offers being presented.

2. Potential Confounders:

Potential confounders are variables that can affect both the treatment assignment and the outcome, creating a spurious association between the treatment and the outcome. In order to isolate the true causal effect of the treatment, it is essential to adjust for these confounding variables. For instance, in a marketing campaign, a potential confounder could be the customer’s income, as it may influence both the bank’s decision to target the customer and the customer’s likelihood of responding positively to the campaign.

Identifying Treatment Variables and Potential Confounders in Causal Inference

1. Importance of correctly identifying treatment variables and potential confounders:

Correctly identifying treatment variables (interventions) and potential confounders are crucial in causal inference, as it isolates interventions’ causal effect on the outcome of interest. By adjusting for confounding variables, we can eliminate bias in the estimated causal effect, leading to more accurate and reliable results.

2. Identifying treatment variables and potential confounders in the bank dataset:

In the bank dataset, variables associated with the current marketing campaign serve as potential interventions, as the bank can directly control them. From there, we will investigate the effects of ‘contact’ (mode of communication) and ‘campaign’ (number of contacts). ‘contact’ will be encoded as a binary variable (0 for cellular, 1 for telephone).

To identify potential confounders, we follow two considerations: avoiding post-intervention variables and putting ourselves in the shoes of the hypothetical bank employee making the decision. For this dataset, potential confounders include:

Client characteristics (‘age’-’loan’): These affect the client’s decision to invest in a term deposit and may be consulted by the bank employee when contacting the client.

Previous campaigns (‘pdays’, ‘previous’, ‘poutcome’): These indicate the client’s previous receptiveness to the bank’s products and would be part of the client’s record.

Economic indicators (‘emp.var.rate’-‘nr.employed’): These conditions may influence the client’s decision as well as the bank’s practices.

For the ‘contact’ intervention, we also include ‘month’ to account for seasonality effects and ‘campaign’ to consider the number of contacts up until this point. By correctly identifying these variables, we can better estimate the causal effect of the interventions on the outcome, leading to more informed decision-making.

Causal Inference Techniques and Their Application

1. Causal Inference Techniques:

Various causal inference techniques can help control for potential confounders and isolate the causal impact of treatment variables, such as contact mode. Some popular techniques include:

Propensity Score Matching: This method involves estimating the propensity scores, which represent the probability of receiving a particular treatment given the observed characteristics (confounders). Then, individuals with similar propensity scores are matched, allowing for a comparison of the treatment outcome between the groups that received different treatments while controlling for confounders.

Inverse Propensity Weighting (IPW): IPW estimates causal effects by weighting observations based on their propensity scores. Each observation is assigned a weight that is inversely proportional to the probability of receiving the treatment they actually received, given their observed characteristics. This technique aims to create a pseudo-population where the treatment assignment is independent of the confounders, allowing for an unbiased causal effect estimation.

Instrumental Variables (IV): IV analysis is a technique that utilizes an external variable (the instrument) that is related to the treatment variable but not directly related to the outcome, except through the treatment. This method allows for consistent estimation of causal effects when unobserved confounding may be present.

2. Analyzing the Effect of Contact Mode on Treatment Outcome:

Applying these techniques to the analysis of contact mode’s effect on the treatment outcome, such as customer conversion rates, can help us understand the effectiveness of different communication methods in driving marketing campaign success.

Propensity Score Matching (PSM): For example, when comparing the effectiveness of phone calls and emails, PSM would involve matching individuals who received phone calls with those who received emails based on their propensity scores. This method allows us to estimate the causal effect of contact mode on the outcome (e.g., conversion rate) by controlling for confounders.

Inverse Propensity Weighting (IPW): Using IPW, we can create a weighted pseudo-population where the assignment of contact mode is independent of confounders. This allows us to estimate the causal effect of contact mode on the treatment outcome by comparing weighted averages of the outcomes for different contact modes (e.g., phone calls vs. text messages).

Instrumental Variables (IV): In the case of IV analysis, we could use an external variable, such as the geographic region, as an instrument if it influences the choice of contact mode but is not directly related to the treatment outcome. Through this approach, we can estimate the causal effect of contact mode on the outcome while accounting for potential unobserved confounding.

By employing these causal inference techniques, we can better understand the causal relationship between different contact modes and their impact on marketing campaign outcomes, informing data-driven decision-making for marketing strategies.

Inverse Propensity Weighting

Inverse Propensity Weighting (IPW) is a popular method used in causal inference to estimate the causal effect of a treatment or intervention on an outcome of interest while accounting for potential confounding variables. The key idea behind IPW is to reweight the observed data to create a pseudo-population in which the treatment assignment is independent of the confounders.

The IPW method involves the following steps.

a. Estimation of Propensity Scores: The propensity score is the conditional probability of receiving a particular treatment given the observed confounders. These scores can be estimated using logistic regression or other suitable models.

b. Calculation of Inverse Propensity Weights: After estimating propensity scores, inverse weights are calculated for each individual in the dataset. Given the observed confounders, these weights are the inverse of the probability of receiving the treatment that was actually received. For treated individuals, the weight is the inverse of the propensity score, whereas, for control individuals, the weight is the inverse of one minus the propensity score.

c. Estimation of Causal Effects: The causal effect of the treatment on the outcome is estimated by comparing the weighted average outcomes of the treated and control groups in the pseudo-population. The difference in these weighted averages corresponds to the estimated causal effect, also known as the average treatment effect (ATE).

IPW has several advantages, such as its ability to handle continuous treatments, missing data, and censoring. However, it is sensitive to the accuracy of the propensity score model and may yield biased estimates if the model is misspecified. In practice, IPW is often combined with other methods, such as matching or regression adjustment, to improve the robustness and efficiency of causal effect estimation.

The IPW model implementation is performed in the notebook here using the Causallib library in Python.

Standardization

Standardization, also known as the G-formula or the adjustment formula, is a method used to estimate the causal effect of a treatment or intervention on an outcome of interest while accounting for potential confounding variables. It involves directly adjusting the observed data by standardizing the outcome variable within strata or levels of the confounding variables.

Here’s an outline of the standardization process:

Stratification: Divide the dataset into strata based on the levels or categories of the confounding variables.
Estimation of Conditional Outcomes: Within each stratum, estimate the average outcome for each treatment group (e.g., treated and control).
Weighted Averaging: Compute the weighted average of the conditional outcomes across strata, using the distribution of the confounding variables in the target population.
Causal Effect Estimation: Calculate the difference in the weighted average outcomes between the treatment groups to estimate the causal effect, also known as the average treatment effect (ATE)

Comparison with Non-Causal Analysis

In a non-causal analysis, we might have simply used correlations or regression techniques to understand the relationships between variables, without considering the underlying causal mechanisms. This could lead to misleading conclusions, as we would not account for confounding factors or the effects of interventions.

https://twitter.com/rlmcelreath/status/1436344384069509128

By contrast, causal analysis allows us to estimate the true causal effects of interventions, providing a more accurate and reliable understanding of the relationships between variables. In our example, we were able to estimate the causal effect of marketing decisions on client behavior, which could inform more effective marketing strategies for the bank.

Conclusion

In conclusion, understanding causality is essential in machine learning and data-driven decision-making, as it allows us to draw more accurate and reliable conclusions from our analyses. By using causal inference techniques like IPW and standardization, we can better understand the true relationships between variables and make more informed decisions based on the insights gained from our analyses.

For those interested in learning more about causality and causal inference, I recommend the following resources-

Judea Pearl’s book, “The Book of Why: The New Science of Cause and Effect,” — This book provides an accessible introduction to the theory of causality and its applications in various domains
Miguel Hernán and Jamie Robins’ book, “Causal Inference: What If” — This comprehensive textbook covers the concepts and methods of causal inference in depth, including potential outcomes, directed acyclic graphs, and various estimation techniques

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9991894/
https://matheusfacure.github.io/python-causality-handbook/01-Introduction-To-Causality.html
Wenwen Ding, “Causal Inference: Connecting Data and Reality”, The Gradient, 2022.
https://www.yuan-meng.com/posts/causality/
https://github.com/BiomedSciAI/causallib
https://www.pywhy.org/dowhy/v0.9.1/index.html