Knowledge Graph Representation: TypeDB or OWL?

Why does TypeDB implement its own ontology language?

Szymon Klarman
Vaticle
Published in
8 min readJan 19, 2017

--

In this blog post, we take a closer look at a few of the key aspects that differentiate the knowledge representation model adopted by the TypeDB knowledge graph platform from the popular Semantic Web formalisms: RDF(S) and OWL. In effect, we are answering the frequently asked question “Why does TypeDB implement its own ontology language instead of using the existing W3C standards?”

This post is aimed at readers familiar with the notion of formal semantics and working experience in modelling ontologies, particularly using RDF(S) and OWL. We want you to get as much as possible out of our writing, and we are happy to discuss this post in the comments section below or via our Community Discord channels. Please get in touch!

Knowledge Graphs: a new frontier for knowledge representation

The emerging paradigm of organising and managing complex, highly interconnected data as so-called knowledge graphs poses a peculiar combination of knowledge and data representation challenges [1]. Knowledge-graph-based applications need to operate efficiently over semantically rich, yet well-structured and constrained graph data. While relational modelling techniques and graph databases (including OWL databases) are useful tools to address some of the specific issues, they cannot offer a comprehensive technical and conceptual infrastructure for the entire task. Many turn to the Semantic Web standards instead, with the prominent Web Ontology Language (OWL), as an alleged “silver bullet” for the semantic graph management challenge. However, as powerful as the Semantic Web stack proves to be in the context of linked data publishing on the web, its value as a knowledge graph representation solution for stand-alone, domain-specific applications is less obvious.

With TypeDB — our open-source knowledge graph platform — we bridge concepts from several knowledge and data representation paradigms to specifically address what we see as a shortfall. In this post, we review the central motives and design decisions behind these efforts.

Knowledge representation on the Semantic Web

The Semantic Web (or Web 3.0 as it is sometimes also called) is a W3C initiative, started by the end of 1990s, of extending the web’s existing architecture with a layer of formal semantics. This layer is meant to enable machines to share and interpret data globally in an intelligent and meaningful manner. The W3C technology stack introduced towards that goal consists, among other components, of three data and knowledge representation standards: RDF, RDFS and OWL. Let us shortly review these formalisms.

RDF

RDF (Resource Description Framework) is a graph-based data model. It represents information as a labelled, directed multigraph with vertices and labelled edges (multiple edges with different labels between the same nodes are allowed). Vertices consist of IRIs (representing abstract “things”), literals (concrete data values) and blank nodes (dummy “convenience” nodes).

An RDF graph is expressed as a set of <subject, predicate, object> triples, each interpreted as an edge labelled with “predicate” going from the “subject” node to the “object” node. RDF does not support any semantics on its own, other than those carried over from the XML datatype definitions — it’s simply a data model. SPARQL is the language dedicated to querying RDF graphs, which is natively implemented by triple stores, i.e., databases developed specifically for storing and managing RDF data. The Wikidata project offers a prominent example of exposing RDF data via a live SPARQL endpoint run on top of a triple store.

RDFS

RDFS (RDF Schema) extends RDF with the most basic ontological constraints and semantics: class and property subtypes, along with property range and domain restrictions. These constructs allow for building very simple type hierarchies over RDF data, which are also represented within RDF graphs. Because of that simplicity, the effective reasoning mechanism over RDFS can be captured entirely by SPARQL (via property paths), thus not requiring additional, computationally expensive inference tools.

OWL

OWL (Web Ontology Language) is a family of description logic-based ontology languages, each varying in its expressiveness and computational complexity. OWL adds numerous ontological constructs on top of those introduced by RDFS. OWL ontologies can also be represented in RDF graphs, but to make any meaningful use of their intended semantics in the prototypical use-case scenarios, one needs specialised tools: reasoners (OWL DL, EL), rule engines (OWL RL), and query rewriting systems (OWL QL).

OWL adopts the so-called open-world assumption (OWA), as opposed to the closed-world assumption (CWA) characteristic of relational database systems, meaning that a lack of information is not interpreted as if the information were false. For instance, the OWL constraint “Every parent must have at least one child” is consistent with the dataset containing the single fact “John is a parent”, without any mentions of John’s children. No mention of children does not imply no children; on the contrary, unless specifically told otherwise, we can safely assume John has a child, even if we do not know about it. This philosophy is a natural fit for the open-ended web environment, where incompleteness of information can be taken for granted.

As the adoption of the RDF(S) standards for publishing data on the web has seen a notable uptake over the recent years, the use of OWL has been surprisingly limited [2], [3]. This is true both in the number of applications it has been effectively used for, and in the number of specific ontological constructs that get ever employed in practice. One of the scarce examples is delivered by Ordnance Survey, the national mapping agency for Great Britain, which employs expressive OWL ontologies for structuring geographical and administrative data. Some of the commonly acknowledged reasons behind that phenomenon are exactly those that have encouraged our company to keep pursuing a more suitable knowledge representation solution, as explained in the next part.

Why TypeDB instead of OWL?

Labelled, directed multigraphs happen to be the structures underpinning the RDF data model, so it is relatively straightforward to devise a mapping between RDF and hypergraphs. However, the real difference appears at the ontology layer, where TypeDB exposes a higher level knowledge model, allowing developers to represent their application domain in terms of entities, resources, relations and roles, as opposed to OWL’s individuals, literals, properties and classes.

Here are the four main reasons why we believe TypeDB ontologies are a better fit than OWL for modelling knowledge graphs in the context of stand-alone applications:

1) TypeDB combines the Open and Closed World Assumption

By adopting the OWA, OWL makes it very hard to help validate consistency of data and ensure its proper structure. And that is what knowledge graph applications typically require, in a similar sense as relational databases require strict schemas to guarantee the quality of their data.

In TypeDB, we carefully combine both styles of reasoning, taking the best of two worlds: ontological-style open-world inference, and schema-like closed-world constraint checking. The long-standing antagonism between the open-world “ontological” and closed-world “schema” modelling stems, in our view, not principally from the formal incompatibility between the two approaches. Rather, it is rooted in the extreme philosophical views on the prototypical application scenarios they are ideally suited for: the open-ended, heterogeneous web of data vs. closed, curated, single-viewed data stores. Because we focus on large, domain-specific knowledge graphs, we find both ends of this spectrum too limiting and see a natural need for endorsing a mixed, yet still balanced solution.

2) OWL profiles have an unsatisfactory balance of expressiveness vs complexity

None of the standardised OWL profiles directly match the typical schema/ontology requirements for knowledge graph applications. In most cases, knowledge graphs require rich constraint patterns to be expressed over the relationships (edges) in the graph, which are only available to some extent in OWL DL, i.e., the most complex of the decidable OWL profiles. At the same time there is little demand for very elaborate class descriptions involving logical operators, broadly supported by that profile, with the expressiveness of the lightweight OWL QL, OWL RL, or even RDFS being sufficient in this respect.

In theory, OWL architecture invites the use of arbitrary fragments (as needed on per use-case basis). However, in practice, “cherry picking” is impeded by the nature of the available reasoning tools, which must anyway involve expensive computational techniques to account for the entire, respective OWL profiles. Just to reason with the two simple constraints “Every parent has a child” and “Every child is a person”, one must involve a full-fledged OWL DL reasoner — a tool that, on average, will scale poorly with large data. This commonly pushes Semantic Web practitioners into a sole use of RDF(S), which on its own is too simplistic as an ontology/schema language.

3) TypeDB is dedicated to graph data

Even in its full expressiveness, OWL is not ideally suited for reasoning with complex graph structures. Its formal foundations (logics with the so-called tree-model property), determined largely by computational limitations (predominantly decidability), make it, in fact, a much more natural language for managing tree-shaped data. Consequently, the entire complexity/expressiveness overhead one must accept to work with OWL to start with does not return the value in the context of knowledge graphs.

4) OWL has a high entry threshold for non-logicians

As the design of OWL databases has been driven primarily by research on description logics, the entry threshold for non-logicians (in the sense of being able to comprehend the language and achieve the intended behaviour of the OWL-backed systems) is significant. This is another reason why many developers chose to stick to RDF(S).

By ensuring that TypeDB’s knowledge representation formalism remains lightweight and is built bottom-up, following the experiences and needs of developers, we hope to enable more semantic capabilities to a much larger audience than that of OWL.

By committing to a novel ontology formalism from that underpinning the Semantic Web, TypeDB had to be consequently equipped with a new, dedicated query language, TypeQL, which is intended to offer the optimal access to information represented in TypeDB knowledge graphs. We will discuss the formal properties of TypeQL in more detail in future posts.

The design of a practical yet well-founded knowledge representation formalism is far from being a simple task, and takes careful considerations on numerous issues involving formal, knowledge engineering and technological perspectives. There are many trade-offs and hard compromises to be made, before a satisfying and stable specification can finally surface. While the work on this front continuously progresses at Vaticle, we invite you to check our documentation and provide your feedback.

[1] L. Ehrlinger, W. Wöß: “Towards a definition of knowledge graphs”, SEMANTiCS 2016.

[2] B. Glimm, A. Hogan, M. Krötzsch, A. Polleres: „OWL: Yet to arrive on the Web of Data?”, Linked Data on the Web Workshop (LDOW) 2012.

[3] J. Hendler: “On Beyond OWL: challenges for ontologies on the Web”, OWL: Experiences and Directions Workshop (OWLED) 2015.

With thanks to my fellow editors Nicholas D, Jo Stichbury, Haikal Pribadi, Borislav Iordanov and Precy Kwan for their input.

--

--

Szymon Klarman
Vaticle

knowledge representation and reasoning | knowledge graphs | linked open data | AI | logic | https://klarman.me