What’s new in CausalNex v0.10?

Published in

QuantumBlack, AI by McKinsey

9 min readMay 17, 2021

Paul Beaumont, Data Scientist, Hiep Nguyen, Data Scientist, Philip Pilgerstorfer, Data Scientist, Zain Patel, Software Engineer, QuantumBlack

CausalNex is an open source Python library that helps data scientists and domain experts to co-develop models that go beyond correlation and consider causal relationships. CausalNex provides a practical ‘what if’ library, deployed to test scenarios using Bayesian Networks (BNs), interpretable, graphical models.

Since our first release in January 2020, CausalNex has been well received by the community and we are very grateful to everyone who has helped us reach the 1,000 “GitHub stars” milestone. Community input has been crucial in the library’s ongoing development and we are delighted to now release CausalNex v0.10.0 — an update to provide data scientists with an improved experience when building and querying Bayesian Networks.

What’s new?

The focus of this release are features that can boost model performances and speed up inference time. The release includes:

Advanced discretisation strategies: Functionality to help to find optimal thresholds when discretising continuous variables
Faster inference: Calculating the Markov blanket of a graph to simplify inference without losing any pertinent information, and an extension of our Inference Engine’s “.query()” functionality to support multiprocessing and speed up inference time
A new tool to simplify fitting probabilities from a Bayesian Network: an sklearn compatible class that supports fitting conditional probability distribution (CPDs), discretising features, and making predictions.

Given that it’s been a while since our last communication update, readers may be unaware that previous releases removed the controversial Boston Housing dataset dataset from all of our tutorials, replacing it with the Diabetes dataset.

This article will dive deeper into each of the new functionalities and provide code snippets that demonstrate their usage.

Advanced discretisation strategies

One of the original design decisions in CausalNex was to use discrete probability distributions, instead of supporting continuous variables. At QuantumBlack, we found that modelling continuous features relied too heavily on unrealistic normality assumptions of real-world data, and discretisation — whilst inherently “losing” information in the discretisation process — lead to better models overall.

For that reason, continuous features or categorical features with a high number of classes need to be discretised before fitting in CausalNex. However, this does mean that care is required when choosing a discretisation method, as this will directly impact the output of the final model.

One new feature of this release is to apply supervised learning algorithms to finding the optimal splitting points for a continuous variable. Based on our experiments, the supervised approach will outperform existing unsupervised methods such as uniform, fixed and quantile. In this release, we support two new discretisation methods: decision tree and the MDLP algorithm. Essentially, the discretisation thresholds will be the split points of a feature when we try to optimise the accuracy between a particular feature and the target (e.g. Gini in a decision tree).

Throughout this article, we will use the Diabetes Data Set to demonstrate this feature.

import warnings
warnings.filterwarnings(“ignore”)
import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
from causalnex.discretiser import Discretiser
from IPython.display import Image
from causalnex.plots import plot_structure, NODE_STYLE, EDGE_STYLE
from sklearn.datasets import load_diabetes
from sklearn.preprocessing import StandardScalerdiabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
names = diabetes[“feature_names”]
ss = StandardScaler()
X = ss.fit_transform(X)
y = (y — y.mean()) / y.std()raw_data = pd.DataFrame(X, columns=names)
raw_data[‘target’] = ystruct_data = raw_data.copy()
data = raw_data.copy()data.head()

As we can see, all of the features in our dataset are continuous and cannot be fit to the Bayesian Network as they are. Assume that we are trying to predict target given other features, we can discretise other features using a supervised learning approach. The below example leverages the new DecisionTreeSupervisedDiscretiserMethod:

from causalnex.discretiser.discretiser_strategy import (
    DecisionTreeSupervisedDiscretiserMethod,
)features = list(data.columns.difference([‘target’]))
tree_discretiser = DecisionTreeSupervisedDiscretiserMethod(
    mode=”single”, 
    tree_params={“max_depth”: 2, “random_state”: 2021},
)tree_discretiser.fit(
    feat_names=features, 
    dataframe=data, 
    target_continuous=True,
    target=”target”,
)

Output:

DecisionTreeSupervisedDiscretiserMethod(
    mode=’single’,
    split_unselected_feat=False,
    tree_params={‘max_depth’: 2,
    ‘random_state’: 2021},
)

Object tree_discretiser has learned all thresholds for each of the input features in feat_names. We can now apply these thresholds to our data using the transform method:

for col in features:
    data[col] = tree_discretiser.transform(data[[col]])data.head()

We can also see the discretisation thresholds by looking at the attribute map_thresholds:

tree_discretiser.map_thresholds

Output:

{
    ‘age’: array([-1.52877724, 0.15135724, 0.91505471]),
    ‘sex’: array([0.06347591]),
    ‘bmi’: array([-0.45903838, 0.19809295, 1.53501534]),
    ‘bp’: array([-1.01193172, 0.49603105, 1.24398059]),
    ‘s1’: array([-0.77064317, 0.12611715, 0.38646692]),
    ‘s2’: array([-1.20278984, 0.36409968, 0.37068325]),
    ‘s3’: array([-2.11218154, -0.33193551, 0.59688854]),
    ‘s4’: array([-1.03200924, -0.28336067, 0.64372227]),
    ‘s5’: array([-0.71241763, -0.07908702, 0.5547165 ]),
    ‘s6’: array([-0.67577836, 0.71754661, 2.58982706]),
}

After a few simple steps, all features are now categorical and ready to be used in our Bayesian Network. Note, for a small dataset we recommend setting a low value (<~3) for the parameter max_depth to avoid having too many categories per feature.

Faster inference

CausalNex enables users to harness our learned Bayesian Networks to answer pertinent questions of interest. However, inference computation can sometimes a slow process, so this update includes two new methods intended to alleviate this.

Reducing a graph to its Markov Blanket

The Markov blanket (MB) of a variable is the subset of nodes in the Bayesian Network that contain all the useful information for predicting that variable. In other words, nodes outside a variable’s MB will (given knowledge of the nodes in the MB) have absolutely no influence on the variable of interest.

The concept is particularly useful when we have a large graph and a variable of interest. Instead of considering the whole graph, we need only to consider the Markov blanket subgraph in order to make more efficient inference. To demonstrate the new feature, we continue with the diabetes dataset:

from causalnex.structure.notears import from_pandassm = from_pandas(data)
sm.remove_edges_below_threshold(0.3)
sm = sm.get_largest_subgraph()viz = plot_structure(
    sm,
    graph_attributes={“scale”: “0.5”},
    all_node_attributes=NODE_STYLE.WEAK,
    all_edge_attributes=EDGE_STYLE.WEAK,
)Image(viz.draw(format=’png’))

Now, assume that target is our variable of interest. We actually do not need all the nodes in the network but only the MB of target. To achieve that, we simply need to use the get_markov_blanket function from causalnex. Specifically,

from causalnex.network import BayesianNetwork
from causalnex.utils.network_utils import get_markov_blanketbn = BayesianNetwork(sm)
blanket = get_markov_blanket(bn, ‘target’)

blanket is now a BayesianNetwork object that contains the structure of the MB of the original bn network. This means that if we only care about target and nodes having direct impact on target, we only need to worry about nodes contained in blanket.

viz = plot_structure(
    blanket.structure,
    graph_attributes={“scale”: “0.5”},
    all_node_attributes=NODE_STYLE.WEAK,
    all_edge_attributes=EDGE_STYLE.WEAK,
)Image(viz.draw(format=’png’))

As a result, our region of interest has been reduced to only seven variables, which the InferenceEngine is able to compute marginals for more quickly than a larger graph.

Accepting and multiprocessing lists of observations

The second feature in this release to boost inference time is support for multiprocessing. We leverage pathos.multiprocessing to perform parallel execution of CausalNex’s InferenceEngine given multiple inputs. In instances where a user wishes to evaluate many observations, the new support for a list of observation dictionaries (as opposed to the single dictionary seen previously) and the ability to compute these in parallel will improve computation time:

discretised_data = data.copy()
discretised_data[‘target’] = Discretiser(
    method=”fixed”,
    numeric_split_points = [-0.5, 1],
).transform(discretised_data[“target”].values)target_map = {0: “Low”, 1: “Mid”, 2: ”High”}discretised_data[‘target’] = (
    discretised_data[‘target’].map(target_map)
)bn = bn.fit_node_states(discretised_data)
bn = bn.fit_cpds(
    discretised_data, 
    method=”BayesianEstimator”, 
    bayes_prior=”K2",
)discretised_data.head()

Now, bn is a fitted Bayesian Network and we can use InferenceEngine to query the marginals with a list of observations. For example, given that we have two new observations and would like to understand how those observations will affect the marginal distributions, we can do the following:

from causalnex.inference import InferenceEngineie = InferenceEngine(bn)observation_1 = {“age”: 2, “sex”: 1, “s3”: 3, “s5”: 0, “bmi”: 1}
observation_2 = {“age”: 1, “sex”: 1, “s3”: 2, “s5”: 0, “bmi”: 2}marginals = ie.query([observation_1, observation_2])

In the case of two observations, speed may not be a concern and the overhead of multiprocessing is not gainful. In such instances .query() now accepting a list of dictionaries hopefully will aid ease of use with CausalNex. However, if we have a high number of observations — say 100 — the new multiprocessing feature will be beneficial. To trigger multiprocessing, we simply need to set the parallel parameter in query to True:

pseudo_observation = [observation_1, observation_2] * 50 #generate a hundred observationsimport timestart = time.time()marginals_multi = ie.query(
    pseudo_observation, 
    parallel=True,
    num_cores=16,
)print(“Using multiprocessing, the query took {:.1f} seconds to run”.format(time.time() — start))start = time.time()marginals = ie.query(pseudo_observation)print(“Without multiprocessing, the query took {:.1f} seconds to run”.format(time.time() — start))

Output

Using multiprocessing, the query took 4.7 seconds to runWithout multiprocessing, the query took 10.4 seconds to run

As we can see, the time difference is significant for the same task. parallel=False by default however, because the overhead cost can be more expensive than the task itself if a user does not request ~>100 observed marginals.

Fitting Bayesian CPDs with scikit-learn’s syntax

Previously, when we want to build a classifier using CausalNex, the standard steps often consist of:

While we encourage users to go through the process of each step to better understand the graph and causal relationships in the data, we believe having a tool that combines all the steps to output predictions instantly can come in handy in many situations. As a result, we developed BayesianNetworkClassifier, which helps building models with scikit-learn syntax. BayesianNetworkClassifier is inherited from scikit-learn’s BaseEstimator and ClassifierMixin and can be used as a standard model in a scikit-learn pipeline.

Let’s build a simple classifier with the diabetes dataset to demonstrate this new feature:

from sklearn.model_selection import train_test_split
from causalnex.network.sklearn import BayesianNetworkClassifierraw_data[“target”] = Discretiser(
    method=”fixed”,
    numeric_split_points=[-0.25],
).transform(
    # convert target variable to categorical
    raw_data[“target”].values
)label = raw_data[“target”]
input_data = raw_data.drop([“target”], axis=1)# train test splitX_train, X_test, y_train, y_test = train_test_split(
    input_data, label, test_size=0.05, random_state=7
)# Specify arguments for the modeledge_list = list(sm.edges)
discretiser_alg = {val: “tree” for val in list(raw_data)[:-1]}
discretiser_param = {
    “max_depth”: 1,
    “random_state”: 2020,
} # we will discretise all features using this parameterfeature_discretiser = {
    val: discretiser_param for val in list(raw_data)[:-1]
}# discretising and probability fittingclf = BayesianNetworkClassifier(
    edge_list, 
    discretiser_alg=discretiser_alg,
    discretiser_kwargs=feature_discretiser,
)clf.fit(X_train, y_train)

Output:

BayesianNetworkClassifier(
    bayesian_kwargs={
        ‘bayes_prior’: ‘K2’,
        ‘method’: ‘BayesianEstimator’,
    },
    discretiser_alg={
        ‘age’: ‘tree’, 
        ‘bmi’: ‘tree’,
        ‘bp’: ‘tree’, 
        ‘s1’: ‘tree’,
        ‘s2’: ‘tree’,
        ‘s3’: ‘tree’,
        ‘s4’: ‘tree’, 
        ‘s5’: ‘tree’,
        ‘s6’: ‘tree’, 
        ‘sex’: ‘tree’,
    },
    discretiser_kwargs={
        ‘age’: {
            ‘max_depth’: 1,
            ‘random_state’: 2020,
        },
        ‘bmi’: {
            ‘max_depth’: 1,
            ‘random_state’: 2020,
        },
        ‘bp’: {
            ‘max_depth’: 1,
        ...
        ‘sex’: {
            ‘max_depth’: 1,
            ‘random_state’: 2020,
        },
    },
    list_of_edges=[
        (‘age’, ‘s3’), 
        (‘sex’, ‘age’),
        (‘sex’, ‘bp’),
        (‘sex’, ‘s4’),
        (‘sex’, ‘s6’),
        (‘sex’, ‘target’),
        (‘bmi’, ‘s4’),
        (‘bmi’, ‘target’),
        (‘bp’, ‘age’), 
        (‘bp’, ‘bmi’),
        (‘bp’, ‘s3’), 
        (‘s1’, ‘s5’),
        (‘s2’, ‘s1’), 
        (‘s2’, ‘s5’),
        (‘s4’, ‘s2’), 
        (‘s4’, ‘s5’),
        (‘s5’, ‘target’), 
        (‘s6’, ‘age’),
        (‘s6’, ‘bmi’), 
        (‘s6’, ‘bp’),
        (‘s6’, ‘s4’), 
        (‘target’, ‘s3’)],
    return_prob=False,
)

Finally, we can make predictions using the CPDs the model has learned:

clf.predict(X_test)

Output:

array([0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1])

After a few steps, we have now achieved a classifier clf, which contains the learned CPDS from the training data and can be used to make predictions for new observations with .predict method.

What’s next?

We are very thankful for the community’s input so far and QuantumBlack will continue to develop CausalNex with its users in mind. We encourage all users to contribute by reporting issues and adding new features.

If you have used CausalNex and found the library useful, we would really appreciate if you starred this on “GitHub”. We are very much looking forward to achieving our next milestone of 2,000 “GitHub stars”.

What’s new in CausalNex v0.10?

Written by QuantumBlack, AI by McKinsey