Knowledge Models and Causal Diagrams

Using TypeDB for causal analysis

Jul 14 · 7 min read

A new term is floating around the Computer Science Artificial Intelligence circles that is catching on — “Causal Science” — and it seems this technique helps us better predict future behaviours. The father of Causal Science is none other than Judea Pearl, the same Judea Pearl that created Bayesian Networks. His work on Bayesian networks and causation was so profound that in 2011 Professor Pearl was awarded the highest honours in both Computer Science and Human Cognition. He was awarded the Allen Turing Award “for fundamental contributions to artificial intelligence through the development of a calculus for probabilistic and causal reasoning”. Additionally in the same year, he received the Rumelhart Prize for Contributions to the theoretical foundation of human cognition.

When someone, in the same year, wins an award for contributions to theoretical foundations of Human Cognition and the equivalent to the Nobel prize for Computer Science, that person is probably on to something, and we should pay attention.

On a more lighthearted note, Pearl may indeed be the real-life version of the science fiction character Hari Seldon. Hari Seldon is the mastermind of the “Seldon plan” in the Isaac Asimov novel “Foundation” (coming soon to Apple TV). In the book Hari Seldon creates a new branch of mathematics called psychohistory which Asimov defines as — “Psychohistory depends on the idea that, while one cannot foresee the actions of a particular individual, the laws of statistics as applied to large groups of people could predict the general flow of future events.“ One of my favourite definitions of causal science is “Causality is the study of how things influence one other, how causes lead to effects.” Said another way, “what events need to happen to cause desired outcomes to occur.” Sounds like the Seldon plan :).

Is real life emulating science fiction again? Only time will tell, but many big companies are placing big bets on Pearl’s Causal Science approach to artificial intelligence. Netflix, Lyft , Microsoft and Google are all using it and a host of other companies are seeing the benefits of causal science being embedded in their algorithms.

One of the hallmarks of a causal solution is explainability. Many traditional approaches in data science struggle with explaining why they made a certain recommendation. Industrial companies are cautious to implement recommendations that cannot be explained. We are even seeing lawmakers pass Right to Explanation legislation around automated decision-making.

My journey into causal science began about a year ago and be warned, the learning curve can be steep if you dig into the actual do-calculus mathematics, but paradoxically the general concepts are relatively simple. Fortunately, the complex math has already been wrapped up in causal science packages, like Microsoft’s DoWhy python library, Google’s Causal Inference R package and McKinsey’s CausalNex. You just need to understand how to call these packages and how to interpret their results.

If you want to learn the actual do-calculus mathematics there are lots of good books and classes out there to choose from. I enrolled in the Harvard online class, which shockingly 60,000 other people signed up for. HarvardX: PH559x Causal Diagrams: Draw Your Assumptions Before Your Conclusions

At Geminos we have been working on a modelling platform to make it easier to build causal solutions for industrial clients. Industrial clients generally do not have large teams of data scientists and need AI solutions that can pass the explainability test in automated decision making. These companies need approaches that improve their ability to make the thousands of small and large decisions they need to make every day. They are calling this digital transformation.

We chose Vaticle’s TypeDB to store underlying causal graph structures and knowledge graph structures, in the TypeDB graph. TypeDB has powerful graph query capabilities and an inference engine which implements “rules” to enable reasoning over the graph. Here is how Vaticle explains their reasoning capabilities in TypeDB :

TypeDB is capable of reasoning over data via rules defined in the schema. They can be used to automatically infer new facts, based on the existence of patterns in your data. Rules can enable you to dramatically shorten complex queries, perform explainable knowledge discovery, and implement business logic at the database level.” Vaticle

Following a process simplifies what needs to be done to create causal solutions. Our tooling supports this process and sits on top of TypeDB and allows us to graphically develop the Causal and Knowledge Models without having to be an expert in TypeDB. You can think of the causal model as a map of “How the world works” and the knowledge model as “What is in the world”. For example, imagine you run a third-party logistics (3PL) company that delivers packages to customers on behalf of other companies. You know companies (shippers) leave you (churn) for another 3PL from time to time and you want to understand why this is happening and what the effect of a 20% discount on future purchases would be on retaining customers at risk of leaving.

Knowledge Models

To get started you need to build a knowledge model that models what we know about our shippers, shipments, returns, claims, discount history and service failures. Parts of the knowledge model are computed by the causal model — specifically, the shipper’s perception score, churn probability and shipment score, but more on that later.

The same knowledge model can be used for multiple use cases. For example, if the client wanted a recommendation about which carrier to use for each shipment, we could just re-use the knowledge model greatly speeding up how long it takes to develop adjacent solutions. These models are instantiated in TypeDB as we follow the Vaticle grammar for modelling knowledge graphs. For example, when the user clicks the button top right, the following TypeQL all the rules, entities, relations and attributes are generated.

shipper sub entity,
owns name,
owns address,
owns customer_relationship_score
owns probofchurn,
owns perception,
plays shipment:shipper,
plays staff_changes:shipper,
plays intervention:shipper,
plays payment:payee,
plays news:subject,
plays discount_history:discount;
carrier sub entity,
owns name,
owns address,
owns customer_relationship_score,
plays shippment:carrier,
plays news:subject;
sales_person sub entity,
owns name,
owns address,
owns customer_relationship_score,
owns prob_of_leaving,
plays shipment:sales_person,
plays staff_changes:sales_person,
plays intervention:sales_person,
plays discount_history:sales_person,discount_history sub relation,
owns date,
owns amount,
relates shipper;
payment sub relation,
owns payment_due_date,
owns missed,
owns amount,
relates shipper;
news sub relation,
owns headline,
owns body,
owns source,
relates carrier,
relates receipt,
relates shipper,
staff_changes sub relation,
owns name,
owns new_position,
relates sales_person,
relates shipper;
service_issue sub shipment,
owns service_issue_id,
owns issue,
owns description,
relates shipment,
claim sub shipment,
owns claim_id,
owns description,
owns claim_amount,
relates shipment;
shipment sub relation,
owns items,
relates shipper,
relates carrier;

Causal Models

Back to the problem we are trying to solve, companies (shippers) leave you (churn) for another 3PL from time to time and you want to understand why this is happening and what the effect of a 20% discount on future purchases would be on retaining customers at risk of leaving.

You will need a causal model that models these outcomes (shipper churning, and the effect of discount on churn probability). When building causal models, it is critical to be working with someone that actually understands the business problem because the causal model represents their best understanding of what causes what to happen in their business.

In the diagrams below certain events like claims and service issues cause the shipment score to fall, which causes the shipper’s perception to fall. In turn, this may cause the shipper to leave the 3PL. How much the scores fall by depends on other models that are backed up by historical data. A related causal model might be “What is the effect of offering a 20% discount to shippers that are at risk of leaving?” The first causal model tells you which shippers to worry about and the second causal model tells you what would be the effect of doing something about it. Here is an example of the first causal model about a customer churning:

Within the churn causal model, there needs to be a way to actually call the do-calculus engine this is done in the backend causal reasoner using the Microsoft DoWhile engine.

Readers with sharp eyes will notice that we forked Node-RED to build our modelling tools, so we actually have all the power of Node-RED and the thousands of nodes that are available already, ranging from IOT processing nodes to machine learning algorithms through the low/no-code visual programming environment.

Getting instance data into our knowledge graph is pretty straightforward with Node-RED, as it already has a large and well-supported set of tools for doing ETL. We have implemented a TypeDB client for Node-RED, however, clients are free to use anything that they want to perform the ETL. Node-RED even ships with a dashboard toolset that allows for simple quick UI’s to be developed, but of course sophisticated UI’s can be developed with any development toolset such as Microsoft Azure Power Apps.

At Geminos we believe, that as causal science becomes more and more used in businesses looking to automate decision making and digitally transform, these businesses will need easy to use powerful causal science toolsets that can run in real-time at the speed of business.


Creators of TypeDB and TypeQL