Inference in TypeDB

Joshua Send
Oct 17 · 9 min read

TypeDB is a new breed of database, which uses a rich and logical type system to enable you to tackle complex problems.

One key idea that TypeDB brings to the database landscape is inference. The Wikipedia interpretation of inference is “moving from premises to logical consequences”. This definition is so broad that we can actually identify two distinct types of inference TypeDB does:

  • Rule inference
  • Type inference

With TypeQL, we’ve combined both types of inference into a powerful query language. This forms the basis of a paradigm-shifting database that puts elegant modelling at your fingertips.

Rule Inference

To define rule inference, we should define what a rule is in TypeDB.

A rule is deductive logic embedded in the database.

If a rule represents deductive logic, rule inference is the process of inferring new information according to the deductive logic encoded in rules. This means you can use rules to put domain knowledge of consequences into the database.

define
rule <name>:
when {
<condition>
} then {
<conclusion>
};

This defines what rules and rule inference are conceptually, but what does rule inference look like in practical terms: what are the inputs and outputs?

Given that TypeDB is a database, the obvious source of input is data! Further, all outputs are also data. Rule inference can therefore be seen as the process of deriving new facts from previously known facts, according to the conditions of each rule.

By this definition, we allow inference chains and recursion to appear: by applying a rule to facts previously inferred by any other rule (including itself), complex interaction behaviours can arise naturally.

Example: permissions implemented with rules

A practical and simple use case for rules is for a permission system — most platforms will have a database of users, permissions, and resources, possibly with users belonging to teams.

First, let’s set up a quick schema to model a permission system:

Using TypeDB Studio, we can visualise the schema:

We have two types of access relations: read-access and write-access, which can own (via inheritance) a permission label for convenience at query time. These relations link a resource to a teamor user. Instances of resource can also be owned by a team or user. Finally, a user can be a member of a team.

We might consider the following question:

With what permission, if any, can a user access resources?

In TypeQL, we want to know this:

match 
$x isa user; $r isa resource;
(accessor: $x, resource: $r) isa access, has permission $p;

User Access Rights

A user might have direct or indirect access to a resource, for example via a team or a resource ownership. We can write some logical statements about what a user can access:

1. A user has read access to resources connected with explicit read-access relations. We don’t need rules to deduce anything new, since the user is expected to insert these.

2. A user has write access to resources connected with explicit write-access relations. We don’t need rules for this case either.

3. A user that belongs to a team can borrow the team’s read access to any resource the team has read-access relations for. We need a rule for this, since it is derived information! In rule format:

rule members-borrow-read-access:
when {
(member: $u, team: $t) isa team-membership;
(reader: $t, readable: $r) isa read-access;
} then {
(reader: $u, readable: $r) isa read-access;
};

4. A user that belongs to a team can borrow the team’s write access to any resource the team has write-access to. Similarly to above, in rule format we could write:

rule members-borrow-write-access:
when {
(member: $u, team: $t) isa team-membership;
(writer: $t, writable: $r) isa write-access;
} then {
(writer: $u, writable: $r) isa write-access;
};

Guaranteed Access Rights

We also need to model two invariant effects for access:

1. Any resource owner (team or user) has write access to that resource. Modelling this is extremely easy:

rule owners-have-write-access:
when {
(resource: $r, owner: $owner) isa resource-ownership;
} then {
(writer: $owner, writable: $r) isa write-access;
};

2. Any write access also allows read access. We’ll combine this with some syntactic sugar to make read and write permissions available as permissionattributes on the access relations:

rule permission-read:
when { $access isa read-access; }
then { $access has permission "read"; };
rule permission-write:
when { $access isa write-access; }
then { $access has permission "write"; };
rule permission-write-read:
when { $access isa write-access; }
then { $access has permission "read"; };

The full set of rules is available in one go here.

Rules In Action

Let’s add some toy data, in which a user has no direct resource access but is a member of a team that owns a resource:

If we run our query in TypeDB Studio with reasoning enabled:

We can immediately see that our single user has both read and write permissions to the resource. To obtain this result, we didn’t write any implementation, only rules to derive access permissions, based on resource ownerships and team memberships!

Impact

Without going into any detail about how a rule inference engine can be implemented (you can see a talk on YouTube for that), let’s discuss some of the implications of having handling deductions in the database.

One key benefit is the ability to encode domain logic into the database, in a way that is coherent with your schema. The schema becomes your central source of truth of domain knowledge: the model (represented as types), and consequences (represented as rules) in one place.

Secondly, since rules are declarative and updateable, reflecting changes in your system is straightforward — simply undefine the old rule, and define a new rule that contains the new logic. This saves the mess of having to update intricate application logic.

Relatedly, when building databases with TypeDB and TypeQL, it’s common to be able to remove a huge amount of application code by effectively modelling the schema types, and creating rules. This improves productivity and reduces maintenance across other parts of the stack.

Inferred data is always up to date and correct, respecting the ACID-ity of the transactions happening on the database server. Implementing a performant inference engine in the application layer that has the same guarantees can be very difficult. A rule inference engine in the database can intelligently maintain integrity and performance in transactional workloads.

Finally, as an added bonus, it’s always possible to inspect and explain the reasoning path that led to a particular conclusion. Using explanations, you can gain the benefit of not implementing intricate logic of your domain, while also later tracing the inference steps that your application would have had to handle. It’s even possible to determine all possible reasons an inference was made!

When to use Rule Inference

In most databases, the schema only defines constraints on the layout of the data.

In TypeDB, the schema does more: it both constrains the data layout, according to the types defined, and can include data invariants: data that is derivable from other data points.

If your domain has invariants that have complex interactions, and lead to indirect conclusions over multiple steps of inference, this is a particularly great use for rules. You’ll find that using rules will reduce the complexity of your application significantly, and be easier to update in the future.

Common Use Cases

It may not seem like many domains feature data invariants. However, rules are extremely general constructs. Here’s a quick list of some interesting domains we have seen TypeDB inference used in:

  • meta modelling, data lineage, and data aggregation
  • expert systems, chatbots, and rule-based AI
  • pattern detection: fraud, supply chain analysis, CTI
  • summarisation and data transformation
  • robotics
  • and much more!

Hopefully these serve as an inspiration for the broad applicability of rules!

Type Inference

The second kind of inference, type inference, is a process that is largely transparent to the user of TypeDB. However, it is critical to providing a higher degree of query safety and improving query performance.

A type in TypeDB is a component of your schema, which may be any of an entity, relation, attribute, or role type. These are created via subtyping, and represent the elements of your domain within the database.

Type inference is the process of evaluating the possible types that a variable may represent in a program, before the program is executed.

For example, we can write a query (a sort of program) in our earlier permission system model and search for possible types for each variable:

match (owner: $owner, resource: $r) isa relation;===== after type inference =====
$owner may be [team, user]
$r may be [resource]
relation may be [resource-ownership]

By annotating the allowed type(s) of a variable, we can then use this information to detect contradictions or optimise execution.

Detecting Unsatisfiable Queries

In a classical programming language, we may have a statement that looks like this:

var value = 1.0 + 1.0;

The compiler can determine that value must be a double, because a summation applied to two doubles produces a double.

However, the following can be inferred to be invalid:

var value = 1.0 + "a";

The compiler is likely to throw an error, before executing the program.

This highlights that given sufficient type information, we can catch contradictions before executing the query. Luckily, in TypeQL, queries contain a huge amount of type information!

Let’s take the same permission system schema from before. None of the following queries are satisfiable in the schema:

  • match $p isa permission; $p = 1.0; — analogous to the classic programming example above, we can detect an attribute value type mismatch.
  • match $x isa entity, has permission $a; — only relations can have permissions in our schema.
  • match $r isa resource; $t isa team; (owner: $r, resource: $t) isa resource-ownership; — by swapping the roles, we try to find a resource that is an owner of a team, clearly an error.
  • match (reader: $x, writable: $r) isa access;reader and writable roles never appear in the same concrete relation.

In databases that lack strong typing, these queries will simply execute and fail to find any answers. This signal is incredibly misleading since the queries are semantically nonsense in the domain in the user’s schema. Worse, if finding no answers to the query is an expected outcome, the application can be operating incorrectly without notification.

In short, like strong typing in programming languages, type inference and strong typing in TypeQL eliminate an entire class of difficult-to-detect user errors, increasing productivity and safety.

Query Optimisation

If the query is satisfiable, the type annotations found via type inference constrain the search space that both the query planner and the traversal engine have to consider.

match (owner: $owner, resource: $r) isa relation;===== after type inference =====
$owner may be [team, user]
$r may be [resource]
relation may be [resource-ownership]

Without type inference, we have to assume $owner and $r may be any data instance in the database!

Let’s take a scenario in which there are 100,000 data instances in the database in total: 100 teams and 1000 users; 20,000 resources and 20,000 resource-ownerships, out of 70,000 total relations.

With type inference, the query planner can intelligently choose to start the search from $owner since it has only a total of 100 + 1000 = 1100 instances, and then check for resource-ownership relations on those instances (≈ 21000 operations).

Compare this with possibly searching 70,000 relation instances, and for each of these checking the existence of the owner and resource roles, which is worst-case ≈ 200,000 operations. Not performing type inference results in an order of magnitude more work!

One interesting side effect of type inference’s query optimisation is that it often means you can omit types that are unambiguously typed:

match 
$r isa resource;
(owner: $owner, resource: $r) isa relation;

will execute identically to

match (owner: $owner, resource: $r) isa resource-ownership;

since $r is already inferrable to only be a resource and the relation type can be constrained to the specific resource-ownership type.

Although the examples presented here are simple, TypeDB’s type inference can propagate typing across multiple hops in general, constraining large queries into more manageable search spaces.

Conclusion

TypeDB’s rule and type inference form the basis of a hugely expressive and safe query language. With these features, TypeDB empowers you to create more advanced models and queries. By building abstractions of your data domain, you can think at a higher level, and tackle complex problems more quickly, and more simply.

Vaticle

Creators of TypeDB and TypeQL