Using a Semantic Web of Linked Data to Reconcile Disparate Identities

Kingsley Uyi Idehen
Aug 8, 2017 · 7 min read

Data Integration in its many forms has been perennially challenged by the issue of data variety. The same challenges also apply to privacy.

The key challenge boils down to recognizing, and acting on the fact, that a variety of identifiers are used across disparate systems, to explicitly or implicitly identify the same individual.

In a SQL RDBMS, for instance, explicit identification is provided by the use of Primary Keys, while implicit identification is provided by Foreign Keys.

Across Businesses and Governments, internal identification is explicit, while external identification (shared with the public) is usually implicit, taking the forms of Email Addresses, Personal Profile Homepage URLs, Social Security Numbers, License Numbers, etc.

This post demonstrates how a Semantic Web of Linked Data — generated from a post as the data source — benefits from the wisdom expressed in the OWL Ontology i.e., apply Reasoning & Inference guided by the human and machine-readable Relationship Type Semantics it defines.

OWL (also known as the Web Ontology Language) is a collection of terms that describe the nature (semantics) of a variety of relationship types (relations). Examples include terms that describe the nature of:

  • Equivalence
  • Symmetry
  • Inversion (e.g., Inverse Of, Inverse Functionality)
  • Reflection (e.g. Transitivity)

Like most items associated with RDF or with the notion of a Semantic Web, OWL’s utility has largely been lost in waves of accumulating confusion for the 15+ years since its creation. In my experience, this confusion is the consequence of critical tooling not being in place during the formative years of RDF; i.e., we didn’t have tools like SPARQL (Query Language), Virtuoso (Multi-Model RDBMS and Middleware combo), OSDS (OpenLink Structured Data Sniffer browser extension), etc.

Identity Reconciliation Exercise using OWL Relationship Type Semanics

The exercise specifically covers the use of owl:sameAs (for equivalence) and owl:InverseFunctionalProperty Property Type designation with regards to explicit identity reconciliation.

Tools used in this exercise include:

  • Virtuoso — multi-model RDBMS that supports Relations represented as SQL Tables and/or RDF Property/Predicate Graphs
  • URIBurner Service — live Virtuoso instance that includes free Extract, Load, and Transformation services
  • OSDS — Browser Extension that simplifies SemanticWeb of Linked Data understanding and exploitation via supported Browsers
  • OWL Ontology
  • FOAF Vocabulary

The steps that follow guide you through the process of receonciling a variety of my identities originating from disparate systems (including this article):

  1. Loading an ontology of terms (here, the FOAF Vocabulary) and adding any missing relations (e.g., owl:InverseFunctionalProperty) to the data loaded
  2. Creating an Inference Rule (in this case, anowl:InverseFunctionalProperty assertion [for which Reasoning & Inference support is enabled in all Commercial and Open Source Editions of Virtuoso] about the foaf:mbox relationship type)
  3. Loading instance data
  4. Using SPARQL and Virtuoso’s built-in Faceted Browser Engine to demonstrate Reasoning & Inference (in this case, to reconcile disparate identifiers that have a common referent)

Steps

[1] Load terminology from the FOAF Vocabulary.

[2] Add a missing Inverse Functional Property relationship type assertion to the Virtuoso hosted Named Graph identified by the IRI, <http://xmlns.com/foaf/0.1/>, using the owl:InverseFunctionalProperty term from the OWL Ontology.

To exploit further via the Metadata Tab within Virtuoso’s built-in Faceted Browsing Engine, you can use the following variation of the relations above:

[3] Generate a Reasoning & Inference Rule (based on Virtuoso’s built in support for owl:InverseFunctional and owl:sameAs relationship type semantics).

[4] Create actual data to which Reasoning and Inference will be applied, using the Rule.

Note: You can use OSDS to invoke the Virtuoso Sponger Instance offered via our URIBurner Service. Net effect, the data above is loaded into a sandboxed Named Graph (identified by the URL of this Medium-hosted post) without using SPARQL 1.1. You achieve this by clicking on the "LOD Cloud" Action icon (magnifying glass in front of globe).

SPARQL Query for Loading Data via URIBurner SPARQL Endpoint (if you choose):

Naturally, you can use OSDS to visualize the RDF-Turtle aspect of the dataset (or body) of the SPARQL 1.1 INSERT statement above, reading it directly from the web page of this Medium post.

Image for post
Image for post

Test Query 1

Here we are testing the notion that all subjects of owl:InverseFunctionalProperty relationship type (relation) identify the same entity, if the relationship type object is the same. Basically, that <#kidehen>, <https://twitter.com/kidehen#this>, and <https://www.linkedin.com/in/kidehen#this> all identify the same instance of the foaf:Person class (i.e., me).

Image for post
Image for post

Test Query 2

Here we are testing the notion that subjects and objects of a owl:sameAs relationship type (relation) identify the same entity. Basically, that <#kidehen>, <https://twitter.com/kidehen#this>, and <https://www.linkedin.com/in/kidehen#this> all identify the same instance of the foaf:Person class (i.e., me).

Image for post
Image for post

Using Virtuoso’s Built-In Faceted Browsing Engine

[1] Goto to the Faceted Browsing Service endpoint. Paste in: https://twitter.com/kidehen#this and hit enter

Image for post
Image for post

[2] Then click on Settings and select an Inference Rule for owl:InverseFunctionalProperty reasoning. In this case, it would be urn:ifp:inference:rule as depicted below

Image for post
Image for post

[3] Then click on the Description tab, which returns the page depicted below.

Image for post
Image for post

[4] Click to the Metadata tab, and then click on Coreferences (identifiers that share a common referent) to see the effect of built-in owl:InverseFunctionalProperty Reasoning using the selected Inference Rule. Note: at the current time (due to a quirk in the user interface), you have to click on Permalink to arrive at what’s depicted below.

Image for post
Image for post

The screenshots that follow repeat the sequence above, with explicit (rather than implicit or inferred) owl:sameAs built-in Reasoning and Inference.

Image for post
Image for post
Image for post
Image for post
Image for post
Image for post

Artificial Intelligence, Machine Learning, and Cognition

Artificial Intelligence (AI) is a topic of high interest, at the current time. Typically, it is conflated with Machine Learning (ML) which is pattern recognition and “black box algorithm” oriented i.e., it doesn’t leverage Cognition (Reasoning and Inference capability).

Using the example covered in this post, Machine Learning wouldn’t understand anything about the implications of a sentence that had one or more subjects associated with a common object, where the sentence verb identified an owl:InverseFunctionalProperty instance.

Looking at the same example, through the context lenses provided by a Semantic Web of Linked Data, we have an open standard in the from of an OWL Ontology that describes the nature of a specific class (or category) of Relationship Type that enables a machine or human apply cognition to the challenge posed by Identity Reconciliation — an issue that has challenged alternative Data Integration approaches for many years (and counting).

Why is this important?

Data Integration and Data Privacy are two areas the continue to challenge technology in general.

Data Integration is a problem best solved by leveraging cognition and relationship type semantics. That said, this isn’t a one-way street because the save techniques apply to the Dark Web where data fragments culled from Social Media are pieced together into complete profiles.

In addition to the above, compliance with new regulations (e.g., GDPR from the EU) is another are where what’s covered in this post comes into play.

For those that continue to ignore the power and implications of a Semantic Web of Linked Data, I would like to remind you of the fact that the EU has been on the frontier of this technology for years (as a user and source of research and development funding). Thus, GDPR compliance and enforcement isn’t about what’s on paper, it will be enforced cognitively too!

Links

Related

News & Articles related to OpenLink Virtuoso & Related…

Kingsley Uyi Idehen

Written by

OpenLink Virtuoso Weblog
Kingsley Uyi Idehen

Written by

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related Technologies

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store