The Do’s and Don’ts of Rule and Query Writing

Felicity Mulford
Jul 1, 2020 · 5 min read

RDFox, Datalog and SPARQL

Photo by Joanna Kosinska on Unsplash

Rules offer an expressive way to process and manipulate a knowledge graph. Rules help bring the intelligence layer closer to the data and can also help with query writing. This article aims to provide a list of best practices for getting the most out of your rules and queries. You can also read our introduction to rules here and download our rule guide for more concrete examples.

What is a rule?

A rule is an ‘if-then’ statement. For example:

?x has uncle ?z if?x has parent ?y ?y has brother ?z.

We express rules using datalog. Datalog is a declarative and formal logic-based programming language based on Prolog. Datalog uses the following format:

[?x, :hasUncle, ?z] :- 
[?x, :hasParent, ?y],
[?y, :hasBrother, ?z].

The formula to the left of the :- operator is the rule head (the ‘then’ part) and the formula to the right is the rule body (the ‘if’ part).

Intuitively, a rule says “if [ ?x , :hasParent , ?y] , [?y :hasBrother ?z] all hold, then [ ?x , :hasUncle , ?z ] holds as well”.

What is a query?

A query is a request for data or information from a database. Queries retrieve and model data stored within RDFox. The query language used by RDFox is , which is the RDF standard query language.

Queries can be typed or pasted directly into the shell. For example:

SELECT ?s ?p ?o WHERE { ?s ?p ?o }

This query would select all subject-predicate-object triples within RDFox.

Below are some tips and best practices for writing rules and queries with RDFox.

Do’s

or add rules and facts in an arbitrary order but grouped in a single transaction. This will usually increase the performance of the first reasoning operation.

to improve the performance of a rule. This helps reduce the number of matches in the body that then propagate to the head.

We recommend experimenting with the trade-off between reasoning and query answering time to make queries simpler to write, maintain and answer using rules.

by purpose and import them incrementally when they are required. Rules in RDFox materialise as soon as they are imported which is why we recommend.

Ry. If types exist within your data use these to.

. In the case of rules, incremental retraction/addition means that you don’t have to worry about rebooting the whole system every time you change a rule.

The query will let you know how much data will be affected by the rule.

Test whether the query you wrote before the rule (see above point) and the rule return the same result.

This can be done in a query like:

SELECT ?person WHERE {
{
SELECT ?person (COUNT(?child) AS ?children) WHERE {
?person :hasChild ?child
}
GROUP BY ?person
}
?person :materialisedNumberOfChildren ?number
FILTER ?number != ?children
}

This will return the set of people where the numbers don’t match, so the query should return 0 results.

Don’ts

used in the rules. This won’t be an issue in most cases but can slow down performance if the total number of possible relations to evaluate is large. Example:

[?customer, :referral, “true”] :- 
[?customer, :has, ?referralLink].

The types of ?customer and ?referralLink aren’t defined in the rule which would have to verify all the :has relations. Defining the type helps reduce the number of matches in the body that then propagate to the head. A more appropriate solution would be to do:

[?customer, :referral, “true”] :- 
[?customer, :has, ?referralLink],
[?customer, a, :CustomerType],
[?referralLink, a, :ReferralLinkType].

Consider the following rule, which marks as similar all pairs of entities whose labels are indistinguishable modulo case sensitivity.

[?first, :similarTo, ?second] :- 
[?first, rdfs:label, ?first_label],
[?second, rdfs:label, ?second_label],
FILTER(LCASE(?first_label) = LCASE(?second_label)).

While correct, the above rule is very inefficient as it forces RDFox to compare the labels of of entities, which becomes infeasible for moderately large number of entities (e.g. on a dataset with 1M entities this will result in 1T comparisons). The issue above is that that matches the first atom RDFox must iterate through that match the second atom and apply the filter condition accordingly. RDFox has no way of reducing the number of compared pairs of entities.

An alternative solution is to precompute the values on which the join is performed in a separate rule and use a regular join in a second rule as illustrated next.

[?entity, :lcase_label, ?lcase_label] :- 
[?entity, rdfs:label, ?label],
BIND(LCASE(?label) as ?lcase_label).
[?first, :similarTo, ?second] :-
[?first, :lcase_label, ?label],
[?second, :lcase_label, ?label].

While more verbose, RDFox will evaluate the above program in time proportional to the number of (as opposed to all pairs), which depending on the data could be the difference between terminating and not. The first rule simply computes the lower-case labels of entities, while the second rule uses a direct join on the computed labels to identify the similar entities. Because RDFox uses full indexing, it can efficiently identify, that matches the first atom, that match the second atom. Note that this solution uses additional memory to store the relation :lcase_label, which could be a significant but also necessary overhead for solving the problem.

(in both rules and queries): you are effectively throwing away answers that you already spent (a lot of) time computing.

If you want to say that two variables of the same type should be equal, just call them the same thing.

(a special case of the selective tip but might be worth mentioning specifically).

Do you have any best practises to add to our list? Or situations to avoid? Feel free to get in touch! We will continue to update this article.

For more information head to our website or contact us at info@oxfordsemantic.tech

To request an evaluation license click here.

Team and Resources

The team behind Oxford Semantic Technologies started working on RDFox in 2011 at the Computer Science Department of the University of Oxford with the conviction that flexible and high-performance reasoning was a possibility for data intensive applications without jeopardising the correctness of the results. RDFox is the first market-ready knowledge graph designed from the ground up with reasoning in mind. Oxford Semantic Technologies is a spin out of the University of Oxford and is backed by leading investors including Samsung Venture Investment Corporation (SVIC), Oxford Sciences Innovation (OSI) and Oxford University’s investment arm (OUI). The author is proud to be a member of this team.

Oxford Semantic Technologies

A high performance knowledge graph and semantic reasoning engine

Oxford Semantic Technologies

Oxford Semantic Technologies develop RDFox, the first market-ready high-performance knowledge graph designed from ground up with semantic reasoning in mind. Founded in 2017 as a spin-out of the University of Oxford with a mission to bring cutting-edge research to industry.

Felicity Mulford

Written by

Employee at Oxford Semantic Technologies and Ox Mountain. OST have developed RDFox, a high performance knowledge graph and semantic reasoning engine.

Oxford Semantic Technologies

Oxford Semantic Technologies develop RDFox, the first market-ready high-performance knowledge graph designed from ground up with semantic reasoning in mind. Founded in 2017 as a spin-out of the University of Oxford with a mission to bring cutting-edge research to industry.