Semantic Web Layer Cake Tweak, Explained

Kingsley Uyi Idehen
OpenLink Software Blog
11 min readJul 13, 2017

Situation Analysis

I recently published a tweak to the W3C’s Semantic Web Layer Cake illustration with the following goals in mind:

  • Reflect the current state of affairs — with a bias towards what both exists and is in demonstrable use
  • Attempt to bring clarity to aspects that have been confusing
  • Align pieces of the stack with key business benefits

I was motivated to update this illustration after reading a document by John F. Sowa titled Semantics for Interoperable Systems. He included diagrams illustrating how the notion of a Semantic Web had evolved between the years 2000 and 2005, as depicted below.

One issue of concern with these older illustrations is the apparently overarching role of XML; that is, they create the illusion that XML (a Markup Language) is a mandatory requirement for a Semantic Web, which simply isn’t accurate. It just so happened that circa 2000, XML appeared to provide a more flexible alternative than HTML as a vehicle for constructing a Semantic Web — but even that wasn’t quite accurate. Then as now, Plain Old Semantic HTML (POSH) demonstrably provided a viable “best practice” by which embedded Metadata could be used as the basis for construction of a Semantic Web.

In recent times, I’ve also read a number of posts by Geoffrey Moore that provide business oriented insights to the very challenges that motivated the creation of a Semantic Web project.

Here’s an excerpt from his most recent post titled “AI is from Venus, Machine Learning is from Mars” .

Updated Semantic Web Layer Cake

This particular tweak addresses the overarching XML misconception problem using a variety of RDF Document Types, which do include RDF-XML (also known as RDF/XML), right alongside RDF-NTriples, RDF-Turtle, RDF-JSON, JSON-LD, and others.

Tweaked Semantic Web Technology Layer Cake

The rest of this post covers the other boxes in the diagram, explaining their role and placement in this revised illustration.

Smart (Cognitive) Applications and Services

Smart Applications and Services are built declaratively (rather than imperatively), with loose-coupling of data models, interaction, and visualization, consistent the widely understood MVC (Model, View, Controller) pattern. In addition, interaction and visualization leverage the relationship-type semantics (comprehensible to both humans and machines) that constitute the data model, which is what adds cognition (reasoning and inference) to the mix.

Business benefit? Smart Applications and Services are agile solutions that don’t evolve into costly maintenance and support nightmares.

Trust

Trust is tied to verifiable claims about identity, content provenance, and related matters. Today’s applications remain challenged by the basic notion of identity, perpetuating the mercurial state of trust.

Business Benefit? Trust enables agile solutions that don’t compromise privacy while still being able to create activity audit trails; i.e., users and applications collectively control data privacy. For instance, an application may orchestrate data access between a protected data source and a user profile.

Proof

“Proof of Work” provides a flexible foundation for Trust. Today, we typically have Two-Factor Authentication, which is often more of a tool used by social media vendors to acquire phone numbers (powerful identifiers) that ultimately lead to more spam and other privacy abuses via mobile phones, than a reliable proof of anything.

Business Benefit? Proof enables (variable degrees of) trust to be calculated using flexible combinations of multiple factors, rather than being confined to a few specific factors as is the case with Two-Factor Authentication.

Transmission Security (using Crypto)

Transmission Security refers to over-the-wire protection of data during transmission, accomplished using existing standards covered by Public Key Infrastructure (PKI) and Transport Layer Security (TLS) that have built-in support for cryptography (which is itself addressed by a collection of standards for digital signatures and content encryption).

Business Benefit? Protecting data during its transmission can only enhance privacy.

Unifying Logic

First-order logic serves as the conceptual schema around which data is modeled and understood. Propositions (or claims) provide an underlying foundation for solution development without being confined to a specific application.

Excerpted from John F. Sowa Presentation (and Lecture) about Applied Ontology

Business Benefit? Having logic as the underlying schema around which applications are developed ensures the ability to integrate disparate systems over the long and short terms. No more costly hub-and-spoke integration “solutions” that simply lead to more costly maintenance and support later.

Rules

Rules are premises upon which reasoning and inference can be performed. These take the form of languages for describing what constitutes a premise.

  • R2RML — for describing how to transform entity relationship types (relations) represented as “records in a table” into “RDF Language sentences/statements”
  • SWRL and SPIN — for describing how to dynamically materialize new entity relationships based on existing data
  • SHACL — for describing data entry integrity constraints to be used when creating structured data using RDF

Business Benefit? Reusable rules can be used to drive machine-learning and AI in general, distinct from core application code. This also protects against the dangers of “Black Box Algorithms” that ultimately lead to the same costly maintenance and support cul-de-sacs.

Query

For Query, we have SPARQL for declarative data definition (DDL) and data manipulation (DML) operations on structured data represented as RDF sentences/statements. SPARQL is also a handy standard by which to extend SQL (another open standard), easing dependance on proprietary vendor-specific SQL extensions as the first (or only) point of call.

Business Benefits? Open query standards enable mixing and matching “best of class” applications, the core principle behind any Digital Transformation. For example, rather than ripping and replacing existing SQL-based applications, simply extend SQL using SPARQL. In one fell swoop, all existing ODBC, JDBC, ADO.NET, and OLE-DB solutions can be used on data represented as RDF Language sentences/statements, and are no longer strictly limited to working with data represented as “records in a table”. In addition, query results that serve reports and dashboards can automagically include hyperlinks that function as Super Keys, providing extra dimensions of drill-down and interaction — without writing any additional code.

Dictionaries

Dictionaries (also referred to as vocabularies or ontologies) for collections of formal definitions that describe entity and entity relationship types (i.e., classes and properties). RDF, RDFS, OWL, and Schema.org are key examples.

Business Benefit? Defining the nature of your data distinctly from the application code that operates on it allows reuse of that data without requiring eternal devotion to that application. This includes the nature of entity relationship types covering vital issues such as equivalence, reflection, symmetry, inversion, etc.

Abstract Language

RDF, as a Language, provides systematic use of signs (identifiers), syntax (subject→predicate→object), and semantics (the meanings of the subject, predicate, and object roles) to encode and decode information (data in some context). Naturally, context (or perspective) is a function of the terms used in these statements.

Business Benefit? You (the domain expert) don’t need to delegate incorporation of insight to application developers (domain strangers). You just take notes about items of interest, using a natural shorthand language that’s implicitly multilingual, letting you add your insights to the system directly and immediately.

Sentence Part Identifiers

Identifiers are signs used to unambiguously name (or identify) entities. In addition, they can be used to identify the subject, predicate, and object of an RDF sentence; i.e., they serve the same function as words in natural-language sentence construction.

Alternatively, you can use HTTP IRIs or URIs as the preferred identifier type for RDF sentence construction, introducing the dual effects of denotation and connotation to your note taking endeavor; i.e., words become terms.

Business Benefits? Identification of entities, distinct from the applications in which they’re used, provides immense cost savings over the long haul — especially when the final product is a Semantic Web.

Document Types

As with natural-language note-taking, you ultimately need to save your sentences in a persistent form, which is where documents and their types come into play.

In the case of RDF sentences, notation (how subject→predicate→object sentences are represented) and serialization format are loosely-coupled. Thus, you may craft sentences using a notation that ultimately differs from the final serialization format used to create a document for longterm storage.

Business Benefit? Rather than being confined to a single notation and document content-type (or format), RDF’s abstract nature enables domain experts to work with one notation while developers work another. In fact, any individual may choose to work with a different notation than everyone else involved!

Simple example

A domain expert encounters a report or dashboard to which he/she seeks to add notes, as part of an improvement-oriented feedback loop that includes other domain experts and application developers.

In this scenario, the natural language similarities of RDF-Turtle provide an effective shorthand for domain experts, while JSON-LD or RDF-JSON may be the preference of today’s application developers (2017), and RDF-XML might appeal to a developer who started before the turn of the century.

The beauty of RDF in this scenario is that it enables the same information to be shared among different parties without their personal RDF Document Type preferences introducing a technical or political hurdle.

As you can see, this example breaks the misconception that all parties must use an RDF-XML (or Turtle, or JSON-LD) Document to effectively share information. The power of RDF has nothing to do with a specific notation or document type; it is all about the systematic use of signs, syntax, and semantics to encode and decode information using sentences — just as we do every day in our own native natural languages.

Semantic Web of Linked Data

A Semantic Web of Linked Data is the final piece of the puzzle, resulting naturally from the use HTTP URIs and IRIs in RDF Language sentences that are saved to a variety of network accessible document types. This simple approach is what underlies what is commonly known as the Linked Data Principles outlined in a document by Tim Berners-Lee.

Business Benefit? All of your data exists as a dexterous enterprise Web, distinct from the applications that created or work with it, unleashing the kind of agility implied by every Digital Transformation meme associated with AI and Machine Learning. For instance, as application needs evolve, focus moves away from creating new code modules (and their associated cycles of maintenance and support) to simple updates of RDF documents.

Application Examples (Dog-fooding)

Here are some examples of existing solutions that put what’s been described in this post into real-world use. That is, they demonstrate the practical utility of a Semantic Web of Linked Data as the underlying basis for the development of modern solutions.

  • Semantic Web Notes — using basic document create and publish pattern to construct an RDF document about the Semantic Web (using our WebDAV compliant ODS-Briefcase Application)
  • DBpedia — a live Virtuoso instance that provides SPARQL Query access to a Semantic Web of Linked Data generated from Wikipedia content
  • URIBurner — a live Virtuoso instance that provides on-the-fly Extract, Transform, and Load services en route to creating a Semantic Web of Linked Data that supports query access by SPARQL-, SQL-, ODBC-, JDBC-, ADO.NET-, and OLE-DB-compliant tools for Business Intelligence, Analytics, etc.
  • Our License Generator System — where details of Licenses are maintained in Turtle documents that are loosely-coupled with the system
  • Our License Offers & Shopping System — where License, Offer, and other details are maintained in Turtle documents that are loosely-coupled with the system
  • Web-Scale Verifiable Identity — comes into play via WebID+TLS as an authentication option in our License Generator and Shop Applications (among others)
  • Collection of RDF Language Documents — stored and provided as Turtle by default, but alternative document type options are offered in the page footer — including HTML documents, Plain Old Semantic HTML metadata, and full HTTP content negotiation
  • Smart Data Bot — integrates disparate API definitions into an Action Web where OpenAPI (formerly Swagger) based documentation is the common factor for those not expressly described using RDF sentences

Conclusion

Why is this tweak important?

We need to put an end to the paradoxical notion of a Semantic Web being distinct from the World Wide Web as it stands.

Fundamentally, the Web as it is commonly used and the notion of a Semantic Web are inextricably linked.

In fact, there isn’t any such thing as a non-Semantic Web, nor as a Web devoid of Entity Relationship Type Semantics.

How are the Web and Semantic Web Linked?

The entire system we know as the Web is constructed around the use of hyperlinks (HTTP URIs or IRIs) as entity names, the very same approach retrospectively associated with what became an official W3C standard under the moniker Resource Description Framework (RDF).

Put differently, the Web is a massive collection of sentences/statements — that represent claims (or propositions) where the fidelity of entity relationship type (relation) semantics has been evolving from coarse-grained. (e.g., { <#this> <#LinksTo> <#that>. }) to much more fine-grained (e.g., { <#this> rdf:type <http://schema.org/WebPage> ; foaf:primaryTopic <#that>. }).

A simple depiction of the entity relationships represented in the snippets above, generated via the OpenLink Structured Data Sniffer browser extension.

You can’t have a language without semantics; i.e., syntax alone doesn’t get you anywhere. Hence, the existence of ontologies (term dictionaries) such as RDF (foundation that describes sentence/statement structure), RDF Schema (adds Classes, Sub-Classes, and Sub-Properties), OWL (adds terms that focus on relationship type semantics addressing equivalence, reflection, symmetry, inversion, and the like).

What does this mean for Digital Transformation at the enterprise level?

Organizations seeking to increase their agility by exploiting AI, Machine Learning, Data Virtualization, etc., are no longer limited to acquiring or building applications that ultimately become expensive data silos which slow or even stop the motion they were meant to enable and ease.

The Semantic Web Layer Cake highlights the existence of working open standards that address these key components of all applications, in loosely-coupled fashion:

  • Identity
  • Authentication
  • Secure Data Transmission
  • Data Transformation Rules
  • Reasoning & Inference Rules
  • Data Representation Notations
  • Data Storage Formats (Content Types)
  • Declarative Data Definition & Manipulation using Query Languages

Related

--

--

Kingsley Uyi Idehen
OpenLink Software Blog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.