Introducing CausalNex — Driving Models Which Respect Cause And Effect
CausalNex allows data scientists and domain experts to co-develop models that go beyond correlation and consider causal relationships.
At QuantumBlack, we believe that the ability to identify causal relationships in data is integral to driving more intelligent and effective analytics interventions. Delivering truly transformational performance gains often depends on creating solutions which consider the root cause of an organisation’s challenge.
With this in mind, we have launched our latest open source product CausalNex. CausalNex allows data scientists and domain experts to co-develop models that go beyond correlation and consider causal relationships. Conceptualised and prototyped by QuantumBlack’s R&D function and developed by QuantumBlack Labs, the engine powering our business with an ecosystem of products facilitating advanced analytics at scale, CausalNex provides a practical ‘what if’ library, deployed to test scenarios using Bayesian Networks (BNs).
Leveraging BNs for an end-to-end causality and counterfactual analysis often requires the use of at least three separate open source libraries, each with its own interface. This makes an already complicated process all the more complex and hinders the potential power of BNs.
CausalNex has been developed to address this, offering a streamlined end-to-end process to build models which generate more intelligent, effective interventions. Moreover, by allowing data scientists and domain experts to collaborate, CausalNex generates transparency and trust in models it creates and helps drive wider adoption of recommended interventions.
Understanding The Why Behind The Data
Hybrid learning with data and domain expertise helps to encode important domain knowledge in models, ensures the correct causal direction and avoids spurious relationships.
Identifying causation from data has long been a major area of research and a plethora of methodologies have been developed over the years. The ideal way to infer causality would be to run Randomised Controlled Trials (RCTs) where the variable of interest is chosen at random for each subject. However, such experiments are impractical for social science, time-intensive, highly complex and carry cost and reputational risks for organisations. Following such as methodology could result in a business providing a sub-standard product experience to one group of customers.
Many organisations therefore rely on running traditional linear models on observational data accumulated through day-to-day business. This remains limiting due to two significant challenges:
- Finding the true causal direction continues to be difficult
- The prevalence of confounding variables which influences both the target and a driver of the target
These challenges can be addressed by finding intuitive ways to augment modelling using expert knowledge. In particular, Structural Causal Models such as BNs can be deployed to perform these analyses effectively. Unlike traditional linear models, causal models do not make the assumption that all features are independent of each other. For example, if we were to targeting calorie intake with features of height and weight, traditional linear models would assume that one feature had no influence over the other — yet height obviously influences a person’s weight.
Other appealing aspects of BN models include:
- As a graph model they allow us to efficiently incorporate domain expertise
- Their subsequent capacity to assess the impact from changes to underlying features, i.e. counterfactual analysis
BNs have been typically difficult to deploy because their first step in identifying conditional dependence between features is computationally demanding and mathematically complex. Recent advancement in optimisation techniques such as DAGs with NO TEARS (Zheng et al.) has allowed this initial step to be simplified. This enables a data scientist to quickly generate an intuitive network graph that represents dependencies between a set of variables, which can then be easily inspected and adjusted by a domain expert. This form of hybrid learning with data and domain expertise helps to encode important domain knowledge in models, ensures the correct causal direction and avoids spurious relationships.
Once the graph has been determined, we can identify the strength of dependency by estimating the conditional probability distributions of the variables. The information now reflects the dependency structure between variables which can be leveraged for counterfactual analysis — i.e. asking ‘what happens to my target Y if we change feature X.’
What Is CausalNex?
The CausalNex library enables practitioners to learn structural relationships from data and allow domain experts to verify the accuracy of the relationships between different data sets. You are also able to fit conditional probability distributions and observe the effect of potential interventions.
The CausalNex library:
- Deploys state-of-the-art structure learning methods such as DAG with NO TEARS to understand conditional dependencies between variables
- Allows domain knowledge to augment model relationships
- Builds predictive models based on structural relationships
- Understands model probability
- Evaluates model quality with standard statistical checks
- Enables visualisation which simplifies how causality is understood
- Analyses the impact of interventions using Do-calculus –this generates probabilistic formulas for the effect of interventions in terms of the observed probabilities
The Origins of CausalNex
This software marks the culmination of two years of project experience, research and development, research and collaboration, both within QuantumBlack and beyond. The initial idea began to take shape in February 2018, following discussions by QuantumBlack’s multidisciplinary team who were grappling with causality across client projects. We soon began investing data science resource into initial R&D on this topic, adding fresh experiences from additional project teams as they encountered issues with identifying causality and proposed workarounds.
The publication of DAGs with NO TEARS at NeurIPS 2018 provided particularly useful insight which accelerated our thinking in this field. In February 2019 QuantumBlack Labs, our technical innovation group, began working on the software, pulling together the hundreds of project hours from across the QuantumBlack team and combining this with the 12 months of R&D from our data scientist colleagues. Over the following year, QuantumBlack Labs would refine and develop the CausalNex library, iterating a framework to reduce the learning curve, implementation and deployment of BNs in complex business use-cases across industries such as automobile, banking and pharmaceuticals. To date, CausalNex has been used across seven McKinsey and QuantumBlack projects.
CausalNex would not exist without a global network of leading researchers who have been generous enough to share their own studies and papers. We hope that open sourcing CausalNex will help contribute to the ongoing discussion around data causality and ultimately help others enrich their own approach to analytics.
We recognise that a number of challenges remain around working with BNs, such has efficiently harnessing continuous data and computational challenges with a high dimensional feature space, and we continue to develop and enhance our approaches in this area. In the meantime, we are tremendously excited to see how the data science community responds to CausalNex and look forward to collaborating together in this fascinating sub-field in machine learning.
Are you a machine learning engineer, product manager, data scientist, or designer? Are you looking to work as part of a multidisciplinary team on innovative products and technologies? Then check out our QuantumBlack Labs page for more information.