What is a Semantic Web Knowledge Graph? The main building blocks
We at DataFabric help companies build knowledge graphs on top of their existing data to facilitate better access through unified interfaces, reusability, data enrichment and as a result better analytics and automation. And as in case of any relatively new and complex technology, it’s important to understand and exploit the same terms (a.k.a. names) for things.
It’s a one way to build knowledge graphs
In DataFabric, we tend to exploit the characteristics of a knowledge graph proposed in [1], saying that a knowledge graph:
- mainly describes real world entities and their interrelations, organized in a graph.
- defines possible classes and relations of entities in a schema.
- allows for potentially interrelating arbitrary entities with each other.
- covers various topical domains.
More on the existing definitions of a knowledge graph in [1] and [2].
There are different technologies to build and operate a knowledge graph. In DataFabric we employ Semantic Web standards and technologies, so in our case a dataset following those characteristics and using Semantic Web standards is called a Semantic Web Knowledge Graph.
The foundation of these standards are:
- usage of derefencable URIs for referring to entities, i.e. URIs that point to resources on the Web,
- usage of RDF for representing the graph,
- usage of RDF Schema and/or OWL for representing the schema of the graph.
Find more about SW standards & technologies on https://www.w3.org/standards/semanticweb. For more informal and popular introduction to the ideas behind Semantic Web take a look at [3].
The main building blocks of a Semantic Web KG
RDF organizes knowledge in statements, connecting whether two entities in a knowledge graph by an edge, or an entity with a literal value (e.g. a number of a date).
- entities or instances — are the nodes in a graph. Typically, they refer to an entity in the real world, such as a person, a city, etc.
- literals — are elementary data values, such as numbers or dates. They can be used, e.g., for expressing the birth date of a person or the population of a city. Literals are typed, e.g. integer, boolean, date and time, etc. Each type is identified by an URI and usually XML Schema datatypes are used, e.g. integer is referred by http://www.w3.org/2001/XMLSchema#integer.
- relations — are the edges in a graph. They link two entities or an entity and a literal.
The above concepts are typically used in the assertion part of the knowledge graph. It is complemented by the schema which defines the types of entities and relations that can be used in a knowledge graph. Those encompass:
- classes or types — are the categories of entities that exist in a knowledge graph, e.g., Person, Organization, etc. They can form a hierarchy, e.g., Person being a subclass of Thing.
- properties — are the categories of relations that exist in a knowledge graph, e.g., birth date, birth place, etc.
Classes and properties in all drawings are denoted without URIs just for the sake of simplicity, since they have much longer URIs than the others.
There are another advanced concepts which may be not so widely used, but still may be an important part of a knowledge graph:
- reification statements — are subgraphs asserting statements about a statement from a knowledge graph, e.g. the date when the statement was created, its version or the author, etc. So it’s very useful to maintain the provenance about a statement, but it’s not restricted by this function.
On the drawing above there are two reification statements (in blue) about the statement that Larry Page was born on March 26 1973. The first one asserts the authors of the statement, and the second — is the date when it was modified.
- derived statements — are statements that derived from explicitly added statements based on the rules indicated in the schema (a.k.a. ontology) or in another way. On the drawing a new statement was asserted (with a dashed line), because of the rule indicated in the schema. The rule is indicated by the statement “isSubsidiaryOf is inverse property of hasSubsidiary” which, if supported (and enabled in) by a knowledge graph, will assert the new statement.
Derived statements may be explicitly added (materialized) to a knowledge graph by running a rule engine (a.k.a. reasoner) or be derived in runtime during execution of a query.
Other articles on the topic
- Challenges of Knowledge Graphs: From Strings to Things — An Introduction
- Reification is a red herring
- What is a Knowledge Graph? Unconnected Data is a Liability
References
- Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods.
- Ehrlinger, L., & Wöß, W. (2016). Towards a definition of knowledge graphs.
- Tim Berners-Lee, James Hendler and Ora Lassila (2001). The Semantic Web.