DBpedia + SQL = timbr-DBpedia… Querying The DBpedia Open Knowledge Graph With standard SQL

Carlos F. Enguix
10 min readJan 14, 2020

--

1. Introduction To DBpedia and timbr-DBpedia

DBpedia is a crowd-sourced community project that extracts structured content from mainly partially unstructured and semi-structured parts of Wikipedia pages and other structured sources such a Wikimedia. The project maintains links to external sources and reciprocally many sources point to the latest DBpedia Ontology/Open Knowledge Graph (OKG).

The extracted/consolidated information from a holistic viewpoint represents an ontology/open knowledge graph (OKG), which in turn, also represents a knowledge-base containing at the same time class/property metadata and class/property instance data. DBpedia covers a multitude of domains as compared to typical limited domain-specific knowledge-bases and in addition, it includes a multitude of different international chapters/language communities.

On the other hand, timbr DBpedia represents a synergy between DBpedia + SQL. Permits the querying of the DBpedia ontology/Open Knowledge Graph (OKG) via standard SQL including also path expressions, avoiding the definition of complex joins and complex SPARQL queries, via a SPARQL End-point. Anybody that feels comfortable with SQL can reuse its knowledge for querying/accessing the DBpedia OKG. Also, timbr supports creating ontologies in SQL and also supports loading existing ones like in this case, the DBpedia Ontology/OKG, which is written in OWL.

2. timbr DBpedia Semantic Web Stack

timbr DBpedia supports RDF/RDFS/OWL ontologies via its virtual SQL compiler which fulfills the Semantic Web stack:

  • Modeling, managing, querying ontologies via standard SQL
  • Enabling graph traversals in standard SQL via path expressions as typically included in graph databases
  • Permitting semantic inference at run-time via ontology rules
  • Avoiding complex joins and complex SPARQL queries
  • Supporting RDF/RDFS/OWL ontologies mapping them to backend SQL DBMS’s.
  • extending SQL with inheritance (IS-A relationships allowing the definition of large subsumption hierarchies) and inference capabilities
Figure 1. timbr DBpedia Semantic Web Stack

3. timbr DBpedia Architecture

SQL clients and BI tools access via a Virtual SQL engine/query service (timbr supports JDBC/ODBC), the timbr DBpedia ontology that maps ontology artifacts to backend SQL DBMS’s. OKG Classes/Concepts/Properties are mapped to relational tables/columns via its “three-in-one” data perspectives, timbr, etimbr, and dtimbr schemas (see section 5). Properties may include direct properties, inherited properties (within the subsumption hierarchy) and references that are special attributes representing relationships to other classes/concepts (typically represented as URI’s/foreign keys in the virtual SQL engine).

The DBpedia OKG no longer is required to be accessed by complex SPARQL queries/SPARQL end-points. The DBpedia OKG can be queried with standard SQL. timbr DBpedia also includes an Ontology Viewer/Explorer so end-users can intuitively review the scope of a given sub-part of the OKG. In this technical report, we focus on accessing the DBpedia OKG via SQL clients (in this case MySQL and Amazon Redshift). At the end of this report, we briefly mention the timbr DBpedia Ontology/OKG Explorer and the timbr DBpedia Data Exploration Beta component.

Figure 2. timbr DBpedia Architecture

4. timbr DBpedia SYSTEM TABLES

We briefly mention timbr-DBpedia system tables as the set of tables supporting the definition and querying of the DBpedia OKG via standard SQL. Can be considered as the underlying “system catalog” tables. Among them we can cite:

  • timbr.SYS_CONCEPTS: contains unique timbr DBpedia ontology class/concepts
  • timbr.SYS_INHERITANCE: contains base/derived class/concepts
  • timbr.SYS_ONTOLOGY: contains the set of classes/concepts with the respective properties, primary keys, inherited primary keys, parent class/concepts, inheritance level and so on.
  • Timbr.SYS_RELATIONSHIPS: includes a total of 8 columns, among them class/concept, the related class/concept, foreign key name associated, etc.
Figure 3. timbr DBpedia SYSTEM TABLES database diagram

When working with SYSTEM TABLES, MySQL is the activated DBMS. For the DBPedia ontology/OKG, the default DBMS is Amazon Redshift. Nevertheless, any relational back-end can be configured to store the timbr ontology SYSTEM TABLES. With respect to the DBpedia Ontology/OKG, any SQL backend is supported. Here we have worked with Amazon Redshift and for the Data Exploration Beta component, we have worked with a Spark back-end.

5. Three-in-one: timbr, etimbr, dtimbr Ontology Schemas

In timbr DBpedia there are three different viewpoints/perspectives of the ontology representation:

  • timbr schema: implicit representation of the ontology. Includes explicit ontology/relational mapping.
  • etimbr schema: exhaustive representation. Includes the timbr schema ontology artifacts plus derived class/concept properties. In other words, parent class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy (for instance the “thing” class/concept will include all the DBpedia ontology/knowledge graph properties from where all class/concepts inherit entity_id & entity_type properties)
  • dtimbr schema: dereferenced representation that includes the etimbr schema plus graph/hop traversals, avoiding unnecessary joins.

5.1 timbr DBpedia Schema Queries

This particular DBpedia schema representation treats classes/concepts/properties as tables/columns where every class/concept have as a minimum, the following attributes (inherited from the parent ancestor ontology artifact “thing”):

  • entity_id: unique URI/primary key
  • entity_type: self-explanatory, it is the type of ontology artifact

Please note that timbr DBpedia permits multiple inheritance, therefore, entity_type may contain several entries. Also please note that the DBpedia ontology/Knowledge Graph may contain many class/concept property null values.

Query Example:

Retrieve implicit book main information. In this query, we check property/column values to not include “NULL” values (here we consider the “Open World Assumption”). When a column includes a URI, generally it refers either to a primary key or to a foreign key relationship for the column.

One can note that the first query row refers to an invalid publication date. The rest seems to be valid date entries.

5.2 etimbr DBpedia Schema Queries

etimbr DBpedia Schema queries include timbr properties plus derived class/concept properties. Parent/ancestors class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy. Please note that the parent class/concept of all classes/concepts is “thing”, therefore the “thing” class/concept will contain all the set of properties in the e-timbr schema.

Query Example:

The following query executes a UNION of “literarygenre” in “thing” class/concept and “genre” in “genre” concept:

Query Example:

All Ontology/OKG classes/concepts have as ancestor “thing” concept, therefore, we can get the whole set of derived attributes via etimbr.thing schema. In this case, we obtain simple statistics returning the following:

  • number of classes/concept instances in timbr DBpedia ontology/OKG
  • number of distinct class/concept types
  • number of distinct entity_labels (property only accessible via etimbr schema (derived property))

Query Example:

Now we obtain simple statistics by querying etimbr.thing (includes all instance types given that all classes/concepts have as ancestor concept thing):

  • grouping by class/concept type
  • number of class/concept instances (person, place, company)

This query presents an interesting viewpoint that needs to be further explained. If we define the following query:

SELECT count(1) FROM timbr.person”

we retrieve 4339681 “person” concept instances because the timbr inference rules include instances that are mapped to derived classes/concepts of “person” as well, as compared to the query depicted above specifically filtered person instances mapped directly to the “person” concept (2829415)

5.3 dtimbr DBpedia Schema Queries

dtimbr DBpedia schema: includes the etimbr schema plus graph traversals. Relationships to other class/concepts are modeled as properties, therefore avoiding complex joins and enabling path query expressions (typical in graph databases).

Query:

This query retrieves the book URI/ID, book title, author name, and thing literary genre where book title is about science fiction, or literary genre is about science fiction or author is in a selected list of best-seller science fiction authors:

Query:

dtimbr query retrieves the number of person’s types grouped by continent and country (a total of three hops) and entity_type:

6. timbr DBpedia Ontology Explorer

timbr DBpedia models the ontology/OKG with concepts, properties and references mapped to:

  • Concepts: ontology concepts are mapped to relational tables
  • Properties: concept properties are mapped to columns in relational tables and may be of type typical SQL data types such as varchar, integer, timestamp, etc. depending on the underlying back-end DBMS (i.e. Amazon Redshift, Apache Spark, etc.). Properties include inherited properties (from parent/ancestor concepts), direct properties (defined for a given concept)
  • References: represent relationships between ontology concepts and are mapped to SQL foreign keys in relational tables. The SQL data type of a reference is varchar representing an ontology URI.

timbr DBpedia supports complex subsumption hierarchies via IS-A relationships (up-to 8 levels allowing multiple inheritance), properties and references to other concepts. The Ontology Explorer contains the following parts:

  • Ontology Tree: situated on the left-hand side displays the subsumption hierarchy of a given concept
  • Ontology Graph Viewer/Explorer: on the center of the screen, displays the entire ontology sub-part/view
  • Ontology Concept Details: displays a summary of information for the selected concept, including the direct ancestor, the hierarchical level, number of total properties, direct and inherited references, inherited and direct properties
  • Ontology Main Graph Control: allows different options for controlling the output of the ontology explorer such as including references/relationships, properties and so on
Figure 4. timbr DBpedia Ontology Explorer

In the ontology tree, we can see the “book” concept hierarchy, which is in level 4. In the Ontology graph Viewer/Explorer (center of the screen) we have included property/references displaying author, writer, genre, and literarygenre. On the right-hand side, in the ontology concept details, we can see that concept “book” has a total of 126 properties, 6 direct references, 69 inherited references, 15 direct properties, and 111 inherited properties.

7. timbr DBpedia Data Exploration Beta

We can access the new Beta Ontology/Data exploration menu item by selecting Exploration/Data Exploration beta, which firstly it displays a dialog box that demands us to enter the required back-end DBMS. In this case, we can select either DBpedia Amazon Redshift or DBpedia Spark. After selecting DBpedia Spark we have to choose the desired Ontology/OKG class/concept. A set of concept properties is required to fill-in and the required filters and results to be retrieved. For the time being, this Beta does not allow to define a filter that allows path expressions. Nevertheless, we are able to do so via the Concept details Window section, step-by-step incrementally. According to timbr.ai, this feature will be incorporated later in the Beta program.

Figure 5. timbr DBpedia Data Exploration Beta Query Composer

On the right-hand side of the screen below (figure 6), we can find the “book” concept details. A whole set of “book” concept properties is displayed, indicating where we can traverse the reference/foreign key, for instance, by clicking on a small “yellow plane figure” with the word “quick” below the “author” reference/foreign key. On the screen below, just in the middle, it is displayed a set of books with their respective author nodes.

Figure 6. timbr DBpedia Data Exploration Beta Component: Book. Person(Author) Concept/Ontology Viewer

Below (figure 7), we can identify the Data Exploration Beta parts:

  • Data Exploration Concept Viewer: situated on the left-hand side of the screen, displays concept involved in the OKG
  • Data Exploration Ontology/OKG Viewer: situated on the middle of the screen, we can identify several books with some relationships exploded such as book author and author details such as birthplace and death place
  • Data Exploration Concept Details Viewer: on the right-hand side of the screen, displays concept/properties details, giving the possibility of graph traversal with the related/references to other concepts.
Figure 7. timbr DBpedia Data Exploration Beta Component: Book. Person(Author), agent, etc. Concept/Ontology Viewer

On the captured screen below (figure 8) of the Data Exploration Beta timbr DBpedia component, we have “exploded” different “book”, ”author” (person) concepts, which have been added incrementally by clicking on the “path traverse” button. A total of 11 related concepts are involved, including book, person, place, country, continent, language, agent, ethnicgroup, populatedplace, award and so on.

Figure 8. timbr DBpedia Data Exploration Beta Component: Book. Person(Author), Place, Country, Continent, Language Concept/Ontology Viewer

8. Conclusions

With timbr DBpedia, we are able to analyze/traverse the DBpedia Ontology/OKG with standard SQL that includes the set-up of path expressions avoiding cumbersome joins and avoiding complex SPARQL queries/SPARQL end-points. Anybody that has experience working with standard SQL will feel comfortable issuing timbr DBpedia queries. timbr DBpedia is scalable as much as the used back-end. In this technical report, we have been working with timbr DBpedia Amazon Redshift for the OKG, for system tables we have been querying a MySQL backend and for the timbr DBpedia Data Exploration Beta we have been working with DBpedia Spark.

Acknowledgments

I like to thank first of all Amit Weitzner, timbr.ai CEO, for giving me the chance to test drive timbr DBpedia and for always offering the timbr DBpedia latest release. I discovered “serendipitously” timbr.ai searching on Google for research related to Enterprise Knowledge Graphs. I plan to write a research report comparing different approaches, solutions, tools, and software with respect to the definition and construction of Enterprise Knowledge Graphs.

I would also like to thank Dr. Sebastian Hellmann, the Director of the DBpedia initiative for providing information about the latest DBpedia research papers and statistics figures.

--

--

Carlos F. Enguix

ACM-SIGMOD Member | Semantic-Web | Open KGs | Personal KGs | Integrating KGs & LLMs | https://carlosfenguix.website/cv