Navigating the Intricacies of Causal Inference: My Journey with SCM and Beyond

Published in

The whispers of a data analyst

7 min readDec 25, 2023

How Structural Causal Models are Shaping Modern Data Analysis

In recent years, my passion for contemporary methods in Causal Inference has led me to a deep dive into the realms of Structural Causal Models (SCM) and Causal Discovery. While SCM offers a robust way to model and understand causal relationships between variables, it’s the Potential Outcome Framework that often takes center stage in practical applications due to its intuitive nature and ease of implementation. This framework’s widespread adoption across various fields is a testament to its efficacy.

In my survey research, SCM has been an indispensable tool. It allows me to explore relationships between different questions without falling into the trap of unnecessary cross-analyses. Although familiar with several mainstream methods of Causal Discovery, including constraint-based, score-based, and hybrid algorithms, it was only recently that I ventured into implementing some of their foundational algorithms.

One of the key milestones in my journey was mastering the Peter and Clark (PC) algorithm. The PC algorithm aims to discover Directed Acyclic Graphs (DAGs), starting from a fully connected graph — a representation where all variables are initially assumed to be interrelated.

Phase One: Disentangling the Web of Relationships

Initialization: We start with a given dataset, typically represented as a numpy matrix. The number of variables in the dataset defines the number of nodes in our graph. We then create a fully connected graph — a matrix where each edge represents a potential relationship between pairs of variables, excluding self-associations.
Iterative Testing: The process begins with selecting all possible pairs of variables, which are adjacent nodes in the graph. We examine the relationship between these variable pairs without controlling for any other variables (initially setting subset_nodes to 0). If a pair of variables, say x and y, are found to be statistically independent without controlling for other variables, we remove the edge between x and y from the graph.
Conditional Independence Testing: Next, we incrementally increase the number of controlled variables, attempting to control for one variable other than x and y. If x and y remain independent even under these controlled conditions, the edge between them continues to be removed from the graph. This process continues until no more controlling variables can be added, or all relationships between variable pairs have been tested for independence.
Results: At the end of this phase, we arrive at an undirected graph that only contains edges between statistically dependent variable pairs. The variable pairs determined to be independent, along with their control sets, are stored in a separated set.

The process looks like the one shown in the following image.

def pc_1(data):

    nodes = data.shape[1]
    graph = np.ones((nodes, nodes)) - np.eye(nodes)
    pairs = [*combinations(range(nodes), 2)]

    subset_nodes = 0
    done = False
    separated_set = {}

    while not done and graph.any():
        done = True

        for x, y in pairs:
            if graph[x, y] == 1:
                neighbors = [i for i in range(nodes) if graph[x, i] == 1 and i != y]

                if len(neighbors) >= subset_nodes:
                    done = False

                    for z in combinations(neighbors, subset_nodes):
                        z = list(z)

                        if is_independent(data, x, y, z):
                            graph[x, y] = graph[y, x] = 0
                            separated_set[(x, y)] = z
                            break

        subset_nodes += 1
    
    return graph, separated_set

Phase Two: Crafting a CPDAG from the Unraveled Graph

After establishing the initial structure of the undirected graph in Phase One, Phase Two involves transforming this graph into a CPDAG (Completed Partially Directed Acyclic Graph) in alignment with DAG (Directed Acyclic Graph) principles. This phase is crucial for identifying the directionality of relationships between variables.

https://www.researchgate.net/figure/Directed-Acyclic-Graph-DAG-illustrating-collider-bias-a-Variables-A-and-B-are_fig3_344614418

Identifying Colliders: This step involves examining the separated set (from Phase One) to identify pairs of nodes that are conditionally independent and to check for the presence of colliders. A collider occurs when two independent variables, say A and B, both influence a third variable C, but A and B do not have a direct relationship. Recognizing colliders is important because controlling for C can misleadingly suggest a correlation between A and B.
Applying Directional Rules: Once colliders are identified, the direction of relationships between nodes is determined based on specific rules. These rules are applied to orient the undirected edges of the graph. These rules are separately designed to prevent the formation of cyclic graphs and to avoid the occurrence of implausible colliders.
Finalizing the CPDAG: The application of these rules continues until no more edges can be removed or appropriately directed. The outcome is a CPDAG that closely approximates a DAG, though some edges may still remain undirected due to uncertainties or lack of sufficient information to determine their direction.

The rules include

Rule 1: If there is an arrow from node i to node j, and i is not adjacent to k, then the edge j-k is oriented as j → k.
Rule 2: If there is a link from node i through k to node j, then the edge i-j is oriented as i → j.
Rule 3: If there are two independent paths i-k → j and i-l → j, and k and l are not adjacent, then the edge i-j is oriented as i → j.
Rule 4: If there are two independent paths i-k → l and k → l → j, and k and l are not adjacent, then the edge i-j is oriented as i → j.

def pc_2(graph, seperated_set):
    graph_original = deepcopy(abs(graph))
    nodes = graph.shape[0]
    columns = list(range(nodes))
    
    for ij in seperated_set.keys():
        
        i, j = ij
        
        if graph_original[i, j] == 0 and graph_original[j, i] == 0:
            
            for k in [x for x in columns if x not in [i, j]]:
                
                if k not in seperated_set[ij]:
                    
                    if graph_original[i, k] == 1 and graph_original[k, i] == 1 \
                        and graph_original[j, k] == 1 and graph_original[k, j] == 1:
                        
                        graph_original[k, i] = 0
                        graph_original[k, j] = 0

    graph_final = deepcopy(graph_original)
    
    while True:
        change_made = False
        pairs = list(combinations(columns, 2))

        for i, j in pairs:

            if graph_final[i, j] == 1 and graph_final[j, i] == 1:
                
                for k in [x for x in columns if x not in [i, j]]:
                    
                    if graph_final[k, i] == 1 and graph_final[i, k] == 0 \
                        and graph_final[k, j] == 0:
                        graph_final[j, i] = 0
                        change_made = True
                    
                    if graph_final[i, k] == 1 and graph_final[k, i] == 0 \
                        and graph_final[k, j] == 1 and graph_final[j, k] == 0:
                        graph_final[j, i] = 0
                        change_made = True
                
                for k in [x for x in columns if x not in [i, j]]:
                    for l in [x for x in columns if x not in [i, j, k]]:
                        
                        if graph_final[i, k] == 1 and graph_final[k, i] == 1 \
                            and graph_final[k, j] == 1 and graph_final[j, k] == 0 \
                            and graph_final[i, l] == 1 and graph_final[l, i] == 1 \
                            and graph_final[l, j] == 1 and graph_final[j, l] == 0 \
                            and graph_final[k, l] == 0 and graph_final[l, k] == 0:
                            graph_final[j, i] = 0
                            change_made = True
                        
                        if graph_final[i, k] == 1 and graph_final[k, i] == 1 \
                            and graph_final[k, l] == 1 and graph_final[l, k] == 0 \
                            and graph_final[l, j] == 1 and graph_final[j, l] == 0 \
                            and graph_final[k, l] == 0 and graph_final[l, k] == 0:
                            graph_final[j, i] = 0
                            change_made = True

        if not change_made:
            break

    return graph_final

Conclusion: Embracing the Classic PC Algorithm in Data Science

The Peter and Clark (PC) algorithm stands as a classic and influential method in the realm of data science. In my view, its approach mirrors the quintessential steps a data scientist takes: conducting regression analyses and statistical tests in a methodical manner. The PC algorithm even embodies concepts similar to backward and forward stepwise regression, emphasizing iterative refinement and selection of variables.

What truly sets the PC algorithm apart, however, is its use of DAG (Directed Acyclic Graph) principles. This incorporation allows for a more nuanced and accurate discovery of the underlying structure that best represents the data itself. By applying the constraints of DAG, the PC algorithm navigates through complex datasets, systematically identifying and clarifying the causal relationships within.

This ability to discern and model the intrinsic structure of data not only enhances the accuracy of our analyses but also enriches our understanding of the causative dynamics at play. The PC algorithm, thus, is more than just a tool; it’s a testament to the evolving sophistication and depth of data science as a field.

In conclusion, the PC algorithm demonstrates how traditional statistical methods can evolve and adapt, integrating advanced concepts like DAGs to uncover deeper insights. As data scientists, embracing such methodologies not only sharpens our analytical skills but also broadens our perspective in understanding and interpreting the intricate tapestry of data-driven narratives.

Reference

A survey of Bayesian Network structure learning - Artificial Intelligence Review

Bayesian Networks (BNs) have become increasingly popular over the last few decades as a tool for reasoning under…

link.springer.com

Estimating high-dimensional directed acyclic graphs with the PC-algorithm

We consider the PC-algorithm Spirtes et. al. (2000) for estimating the skeleton of a very high-dimensional acyclic…

arxiv.org