Conceptual Relational Data Virtualization, using Existing Open Standards

Exploiting Data Virtualization using Virtuoso Relational Views over Structured Data represented as SQL Tables or RDF Graphs

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
12 min readApr 27, 2018

--

Situation Analysis

Today’s world is driven by applications that leverage the functionality of an RDBMS for complex read-and-write interactions with structured data. These interactions simultaneously support the needs of multiple user profiles.

Remember: “Data” is how we express “Observation” in reusable form. The "Observation" being expressed is one or more entity relationship types (relations).

Tabular Expressions of Relations

The current ubiquity of RDBMS technology is the product of mass adoption of open standards such as the SQL Query Language for declarative data interactions and Persistent Stored Modules (SQL/PSM for short) for implementing Stored Procedures.

Comparison of SQL, SPARQL, and Cypher Query Languages from Stackoverflow Trends

Unfortunately, due to world-views framed by SQL RDBMS vendor marketing, the notion of an RDBMS as delivered by a software application remains a source of mass confusion, primarily across the following axis:

  • Data Representation — via Entity Relationship Types (Relations) represented in Tabular form (and easily depicted as Tables)
  • Declarative Data Interaction —using the Structured Query Language (SQL) for Data Definition and Manipulation
Generic DBMS Relation Schema

This has led to an RDBMS software application that supports the SQL Query Language for declarative operations on data represented as Tabular Relations — also correctly referred to as an SQL-compliant RDBMS application — being colloquially referred to simply as an RDBMS — coincidentally obscuring the fact that there are other kinds of RDBMS, other query languages, and other data representations.

SQL RDBMS Schema for Customer Relation

Graphical Expressions of Relations

In recent times, there has been a upsurge of RDBMS software applications that are not confined strictly to representing relations as Tables. These new products support finer-grained relations which can be represented and depicted as Entity Relationship Graphs, while also supporting alternative query languages (such as SPARQL, Cypher, GraphQL, and others).

In similar vein to their SQL-compliant RDBMS counterparts, vendors of this genre of RDBMS product have tended toward the colloquial moniker “Graph Databases,” which has been marketed to imply non-relational — which is technically illogical, once the nature of data is properly understood — often through the unsurprisingly misunderstood label of “NoSQL” which itself is better expressed as “Not only SQL.” i.e., SQL isn’t the only Query Language for operating on Relations managed by an RDBMS.

Open standards associated with this genre of RDBMS product include —

RDF DBMS Schema for a Customer Class

Relational Data Virtualization

Having established that data is represented using Entity Relationship Types (Relations) which can take both Tabular (common across SQL-compliant RDBMS applications) and Entity Relationship Graphical (common across “NoSQL” RDBMS applications) forms, it should be clear that it is possible to have one RDBMS software application that supports all of —

  • Data Representation — Tabular and/or Graphical Relations
  • Declarative Data Interaction — SQL or SPARQL
  • Rules Declaration — SQL (as in SQL Views) or SPARQL (as in RDF Views, Custom Reasoning & Inference Rules, and Attribute-based Access Controls [ABAC] for Data Security)

This is exactly what’s delivered by Virtuoso — our multi-model RDBMS application that supports both the SQL query language (an open standard from ANSI and ISO) for operating on data represented as relational Tables and the SPARQL query language (an open standard from the W3C) for operating on RDF Graphs.

In a nutshell, Virtuoso provides a uniquely powerful solution for creating and interacting with data without introducing impedance due to preference of structured data representation model (relational Tables or RDF Graphs) or declarative data interaction (SQL or SPARQL query languages).

Virtuoso Relational Data Virtualization Example

This section provides a simple guide, including links to live examples, that demonstrates how to generate Graphical relations (entity relationship types represented as sentences, grouped by a sentence predicate) from Tabular relations (entity relationships represented as records in a table, grouped by a table name functioning as the predicate).

Relational Data Virtualization built using existing open standards (HTTP, SQL, SPARQL, RDF, R2RML, ODBC, JDBC, etc.)

In this example, we will be using the “Northwind” sample schema installable into any Virtuoso instance (download the Virtuoso Demo Database VAD, demo_dav.vad).

Steps to Generate R2RML Script(s)

  1. Attach ODBC-accessible Data Sources to Virtuoso. (If your preferred data source has no ODBC driver but does have a JDBC driver, just add an ODBC-to-JDBC Bridge Driver like our Lite Edition or Enterprise Edition to the mix.)
  2. Use the SQL Tables (RDBMS Tabular Relations) to RDF Sentence/Statement Relations Wizard (more commonly known as the RDB2RDF Wizard) to select SQL Tables from which RDF Views are to be generated. (This step is discussed elsewhere.)
  3. Rather than executing the script that is generated, copy and paste the generated R2RML sentences to a text/plain or text/turtle document, like northwind-8890-tables-r2rml.ttl (downloadable).

Steps to Generate RDF View(s)

  1. Use SPARQL 1.1 INSERT orLOAD, or the Virtuoso Sponger with no-sponge option set, to load the content of northwind-8890-tables-r2rml.ttl into a target named graph (e.g., urn:northwind:8890:tables:r2rml:mappings). For example:
    LOAD <northwind-tables-r2rml.ttl> INTO <urn:northwind:8890:tables:r2rml:mappings>
  2. Run Virtuoso’s built-in R2RML Processor, DB.DBA.R2RML_MAKE_QM_FROM_G, as in:
    EXEC ('SPARQL '|| DB.DBA.R2RML_MAKE_QM_FROM_G ('urn:northwind:8890:tables:r2rml:mappings'));
  3. Use one of Virtuoso’s URI-Rewrite Stored Procedures (DB.DBA.URLREWRITE_CREATE_REGEX_RULE) to create a Rewrite Rule. This amounts to leveraging the power of SPARQL to produce a specific Entity Description Document for the Entity identified in the HTTP URI lookup request.
DB.DBA.URLREWRITE_CREATE_REGEX_RULE
(
'demo_rule2',
1,
'(/[^#]*)',
vector('path'),
1,
'/sparql?query=DESCRIBE+%%3Chttp%%3A//^{URIQADefaultHost}^%U%%23this%%3E+FROM+%%3Chttp%%3A//^{URIQADefaultHost}^/Demo%%23%%3E&format=%U',
vector('path', '*accept*'),
null,
'(text/rdf.n3)|(application/rdf.xml)|(text/n3)|(application/json)',
2,
null
);

The re-write rule above applies a regular expression to the PATH component of each HTTP Request received by Virtuoso, e.g.,

http://demo.openlinksw.com/Demo/suppliers/SupplierID/12#this

When the regular expression is matched, Virtuoso will generate and execute a SPARQL DESCRIBE Query. The solution document will be returned using the content-type negotiated by the HTTP request initiator (i.e., a Web Browser or other User Agent).

Understanding RDF View Generation Output

Quad Mappings (or Quad Maps), each a collection of RDF sentences, are the end product of passing an R2RML document’s content through the Virtuoso R2RML processor. These mappings are basically a collection of data transformation rules that provide RDF Views to the core Virtuoso engine when processing SPARQL Queries. These may apply to local data in Virtuoso’s core RDF_QUAD table, external data in third-party SPARQL-accessible triple- or quad-stores, local data in any of Virtuoso’s SQL tables, or (when using Virtuoso Enterprise Edition) external data in ODBC-accessible SQL RDBMS applications.

Note: This applies to both local and remote data. All data managed by a Virtuoso RDBMS is represented internally as relational Tables, implicitly contradicting the notion that SQL and SPARQL are mutually exclusive while also disproving the notion that Data Virtualization across Relational Models isn’t performant or scalable.

Exploring Information about RDF Views generated by a Virtuoso instance using SPARQL

Here are some sample queries scoped to Quad Mappings associated with Named Graphs holding Virtual RDF sentences targeting the "Demo"."Demo"."Suppliers" relational Table in the Northwind Demo Database.

Basic SPARQL Query [Live Results]

PREFIX virtrdf: 
<http://www.openlinksw.com/schemas/virtrdf#>
SELECT DISTINCT ?s1 as ?quadMap
?s3 as ?rdfPredicate
?s2 as ?sqlTable
WHERE
{
?s1 a virtrdf:QuadMap .
?s1 virtrdf:qmTableName ?s2 .
?s1 virtrdf:qmPredicateRange-rvrFixedValue ?s3 .
FILTER (! contains(str(?s2),"/csv"))
FILTER (! contains(str(?s2),'csv.'))
FILTER (contains(str(?s2),"supplier"))
}
LIMIT 200
Using a Table to align a Quad Map to RDF Predicates mapped to columns in an SQL Table

Detailed SPARQL Query [Live Results]

PREFIX virtrdf: 
<http://www.openlinksw.com/schemas/virtrdf#>
PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s2 AS ?sqlTable
?s1 AS ?rdfQuadMap
?s6 AS ?rdfSubjectURITemplate
?s3 AS ?rdfPredicate
?s10 AS ?rdfObjectValue
?s8 AS ?rdfObjectDataType
WHERE
{
?s1 a virtrdf:QuadMap .
?s1 virtrdf:qmTableName ?s2 .
?s1 virtrdf:qmPredicateRange-rvrFixedValue ?s3 .
?s1 virtrdf:qmSubjectMap ?s4 .
?s4 virtrdf:qmvFormat ?s5 .
?s5 virtrdf:qmfCustomString1 ?s6 .
?s1 virtrdf:qmObjectMap ?s7 .
?s7 virtrdf:qmvFormat ?s8 .
?s7 virtrdf:qmvColumns ?s9 .
?s9 rdf:_1 ?s10

FILTER (! contains(str(?s2),"/csv"))
FILTER (! contains(str(?s2),'csv.'))
FILTER (contains(str(?s2),'"northwind"."suppliers"'))
}
Detailed Mapping from Tabular Relations to RDF Sentence Collection (Graph) Relations

What’s Happening Here?

Virtuoso is using information provided by an R2RML document to transform data represented as Tables into data represented as Entity Relationship Graphs (RDF sentence collections grouped by predicate).

What is R2RML?

R2RML (pronounced ˈrəm-l or ar-too-ar-em-el) is a vocabulary of terms from the W3C that is used to construct RDF sentences that describe how Tabular relations are to be transformed into RDF relations (sentence collections grouped by predicate).

Any R2RML-compliant tool (a/k/a R2RML processor) has an ability to perform this transformation of relation representation.

Visualizing information conveyed by RDF sentences that leverage R2RML terms, focusing on the "Demo"."Demo"."Suppliers" table [Live Link]

Using SPARQL to Explore R2RML

You can query the mapping declarations loaded into Virtuoso just like any other data represented as RDF sentence collections. [Live Results Link]

PREFIX virtrdf: 
<http://www.openlinksw.com/schemas/virtrdf#>
PREFIX rdf:
<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s2 as ?sqlTable
?s1 as ?rdfQuadMap
?s6 as ?rdfSubjectURITemplate
?s3 as ?rdfPredicate
?s10 as ?rdfObjectValue
?s8 as ?rdfObjectDataType
WHERE
{
?s1 a virtrdf:QuadMap .
?s1 virtrdf:qmTableName ?s2 .
?s1 virtrdf:qmPredicateRange-rvrFixedValue ?s3 .
?s1 virtrdf:qmSubjectMap ?s4 .
?s4 virtrdf:qmvFormat ?s5 .
?s5 virtrdf:qmfCustomString1 ?s6 .
?s1 virtrdf:qmObjectMap ?s7 .
?s7 virtrdf:qmvFormat ?s8 .
?s7 virtrdf:qmvColumns ?s9 .
?s9 rdf:_1 ?s10 .
FILTER (! contains(str(?s2),"/csv"))
FILTER (! contains(str(?s2),'csv.'))
FILTER (contains(str(?s2),'"northwind"."suppliers"'))
}

How Does Virtuoso Exploit R2RML?

Whether working with the output of an R2RML generation Wizard or a handcrafted R2RML document, Virtuoso uses a built-in stored procedure to produce a collection of mappings (referred to as Quad Maps — basically Virtuoso’s native equivalent of R2RML) that informs how its query processor handles SPARQL Queries targeting data represented as RDF relations. This is how Virtuoso delivers Conceptual Data Virtualization by leveraging the fine-grained machine-readable entity relationship type (relations) semantics offered by RDF (a Data Definition Language).

Naturally, all of this functionality applies to tabular relations that are local to the Virtuoso RDBMS as well as (with Virtuoso Enterprise Edition) external tabular relations accessible via ODBC connections.

How are Tables transformed into RDF Graphs?

Given a Row in a Table —

  • Each Table Name functions as a Relation Identifier — a collection of Tuples representing entity relationships grouped by a predicate (in this case, the Table Name)
  • A Relation Header is derived from a set of Attribute Names — depicted as Header Rows (Column or Field identifiers)
  • A Row (or Record) is derived from a set of intersections between Tuples and Attribute Names
  • A specific Row is uniquely identified by a selection of Tuple and Attribute Name intersections (usually one, but may be multiple), subject to the worldview of the relation modeler

Thus, you have a Relation, a Tuple, an Attribute Name (often referred to simply as an Attribute), an Attribute Value (intersection of Attribute Name and Tuple), and a Relation Header.

Illustrated Relation (except Relation Header/Heading)
Illustrated Relation (including Relation Header/Heading)

The heuristic for mapping an SQL Table to an RDF Graph is based on the following assumptions:

  • Each SQL Tabular Relation is comprised of one or more Tuples and a Relation Header/Heading for delineating Columns (Fields)
  • Each RDF Relation is comprised of three Tuples — a Subject, a Predicate, and an Object
  • Each SQL Table identifies an RDF Entity Type (or RDF Class)
  • One or more designated SQL Column(s) uniquely identify each RDF Entity — an instance of the RDF Entity Type or RDF Class that identifies the Subject of an RDF sentence
  • Each SQL Column Name (or SQL Field) identifies an RDF Entity Attribute or RDF Entity Relationship Type — an instance of an RDF Property Class that’s identified by the Predicate of an RDF sentence
  • Each SQL Column Value identifies an RDF Entity Attribute Value associated with a specific RDF Data Type— an instance of an RDF Entity Type that identifies the Literal Object of an RDF sentence
  • Each SQL Column Value functioning as a Foreign Key identifies an RDF Entity Attribute Value associated with an RDF Reference Type — an instance of an RDF Entity Type or RDF Class that identifies the Object of an RDF sentence via a URI (e.g., an HTTP URI)

It should now be clear that translating entity relationship types (relations) represented as Tables to relations represented as RDF sentences (Graphs) amounts to breaking up a coarse-grained relation into a fine-grained relation. In other words, a single Table is decomposed into one or more RDF relations, where each RDF relation maps to a Column in the target Table.

Deploying RDF using Linked Data Principles

Contrary to common misconceptions, RDF and Linked Data are not terms that denote the same thing; i.e., they aren’t synonyms.

To this point in this article, we’ve dealt with RDF sentence collections and said nothing about Linked Data. Why? Because Linked Data is an RDF deployment method based on some very specific principles. These principles are as follows:

  1. Identify the Subject of each RDF sentence with a Hyperlink (specifically, a HTTP URI).
  2. Identify the Predicate of each RDF sentence with an HTTP URI.
  3. Identify the Object of each RDF sentence with an HTTP URI or a Literal (which may be typed or untyped).
Effect of using HTTP URIs (Hyperlinks) to Identify Entities Unambiguously

By adhering to these principles, all your data (RDF sentence collections) will manifest as a Web (but not necessarily as a Semantic Web) where a human or machine can lookup the description of the Subject, Predicate, and Object of each RDF sentence .

From Linked Data to a Semantic Web

A Semantic Web builds on the basic principles of Linked Data by additionally requiring the use of human- and machine-readable collections of terms in sentence construction. For instance, the definition of a Predicate would indicate the nature of — i.e., the entity types of — acceptable (or expected) Subjects and Objects.

Using HTTP URIs (Hyperlinks) to identify the Subject, Predicate, and Object [optionally] of a sentence

Live Examples

The links below lead to a collection of RDF Views generated from data sources associated with SQL-oriented Tables (n-tuple structured relations):

Conclusion

As demonstrated by Virtuoso in this post, a single multi-model RDBMS can function as a powerful tool for enhancing data access, integration, conceptual virtualization, and management oriented interactions across data represented as Relational Tables and/or RDF sentence collections (or Graphs).

In addition, the power of hyperlinks (specifically, HTTP URIs) as super keys increases data access across disparate data sources — when Linked Data principles are applied to RDF relations.

The addition of machine-readable entity-relationship-type semantics adds both serendipitous discovery of relevant information and surreptitious data creation and quality enhancement to this powerful mix.

One last thing — all of this is exploitable using existing tools that support any of several open standards: HTTP, ODBC, JDBC, ADO.NET, or OLE DB. In other words, you do not need to “rip and replace” existing technology investments to benefit from current or future digital transformations!

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.