Using a Semantic Web of Linked Data to Reconcile Disparate Identities

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
7 min readAug 8, 2017

Data Integration in its many forms has been perennially challenged by the issue of data variety. The same challenges also apply to privacy.

The key challenge boils down to recognizing, and acting on the fact, that a variety of identifiers are used across disparate systems, to explicitly or implicitly identify the same individual.

In a SQL RDBMS, for instance, explicit identification is provided by the use of Primary Keys, while implicit identification is provided by Foreign Keys.

Across Businesses and Governments, internal identification is explicit, while external identification (shared with the public) is usually implicit, taking the forms of Email Addresses, Personal Profile Homepage URLs, Social Security Numbers, License Numbers, etc.

This post demonstrates how a Semantic Web of Linked Data — generated from a post as the data source — benefits from the wisdom expressed in the OWL Ontology i.e., apply Reasoning & Inference guided by the human and machine-readable Relationship Type Semantics it defines.

OWL (also known as the Web Ontology Language) is a collection of terms that describe the nature (semantics) of a variety of relationship types (relations). Examples include terms that describe the nature of:

  • Equivalence
  • Symmetry
  • Inversion (e.g., Inverse Of, Inverse Functionality)
  • Reflection (e.g. Transitivity)

Like most items associated with RDF or with the notion of a Semantic Web, OWL’s utility has largely been lost in waves of accumulating confusion for the 15+ years since its creation. In my experience, this confusion is the consequence of critical tooling not being in place during the formative years of RDF; i.e., we didn’t have tools like SPARQL (Query Language), Virtuoso (Multi-Model RDBMS and Middleware combo), OSDS (OpenLink Structured Data Sniffer browser extension), etc.

Identity Reconciliation Exercise using OWL Relationship Type Semanics

The exercise specifically covers the use of owl:sameAs (for equivalence) and owl:InverseFunctionalProperty Property Type designation with regards to explicit identity reconciliation.

Tools used in this exercise include:

  • Virtuoso — multi-model RDBMS that supports Relations represented as SQL Tables and/or RDF Property/Predicate Graphs
  • URIBurner Service — live Virtuoso instance that includes free Extract, Load, and Transformation services
  • OSDS — Browser Extension that simplifies SemanticWeb of Linked Data understanding and exploitation via supported Browsers
  • OWL Ontology
  • FOAF Vocabulary

The steps that follow guide you through the process of receonciling a variety of my identities originating from disparate systems (including this article):

  1. Loading an ontology of terms (here, the FOAF Vocabulary) and adding any missing relations (e.g., owl:InverseFunctionalProperty) to the data loaded
  2. Creating an Inference Rule (in this case, anowl:InverseFunctionalProperty assertion [for which Reasoning & Inference support is enabled in all Commercial and Open Source Editions of Virtuoso] about the foaf:mbox relationship type)
  3. Loading instance data
  4. Using SPARQL and Virtuoso’s built-in Faceted Browser Engine to demonstrate Reasoning & Inference (in this case, to reconcile disparate identifiers that have a common referent)

Steps

[1] Load terminology from the FOAF Vocabulary.

SPARQL
DEFINE get:soft “no-sponge”
LOAD <http://xmlns.com/foaf/0.1/> ;

[2] Add a missing Inverse Functional Property relationship type assertion to the Virtuoso hosted Named Graph identified by the IRI, <http://xmlns.com/foaf/0.1/>, using the owl:InverseFunctionalProperty term from the OWL Ontology.

SPARQL
INSERT DATA
INTO <http://xmlns.com/foaf/0.1/>
{ foaf:mbox a owl:InverseFunctionalProperty .
} ;

To exploit further via the Metadata Tab within Virtuoso’s built-in Faceted Browsing Engine, you can use the following variation of the relations above:

SPARQL
INSERT DATA
INTO <http://xmlns.com/foaf/0.1/>
{ foaf:mbox a owl:InverseFunctionalProperty ;
rdfs:subClassOf lod:ifp_like .
} ;

[3] Generate a Reasoning & Inference Rule (based on Virtuoso’s built in support for owl:InverseFunctional and owl:sameAs relationship type semantics).

RDFS_RULE_SET
( ‘urn:ifp:inference:rule’ , ‘http://xmlns.com/foaf/0.1/'
) ;
-- Verify Rule CreationSELECT RS_NAME
FROM sys_rdf_schema
WHERE RS_NAME = 'urn:ifp:inference:rule'
;

[4] Create actual data to which Reasoning and Inference will be applied, using the Rule.

SPARQL
INSERT DATA INTO <urn:kidehen:ifp:demo>
{
<#kidehen>
a foaf:Person;
foaf:mbox <mailto:kidehen@openlinksw.com> ;
schema:name “Kingsley Idehen” ;
foaf:nick “kidehen”
.
<https://twitter.com/kidehen#this>
a foaf:Person;
foaf:mbox <mailto:kidehen@openlinksw.com> ;
schema:name “Kingsley Idehen” ;
schema:mainEntityOfPage <https://twitter.com/kidehen/> ;
foaf:nick “@kidehen”
.
<https://www.linkedin.com/in/kidehen#this>
a foaf:Person;
foaf:mbox <mailto:kidehen@openlinksw.com> ;
schema:name “Kingsley Uyi Idehen” ;
schema:mainEntityOfPage <https://www.linkedin.com/in/kidehen/> ;
foaf:nick “@kidehen”
.
<#kidehen>
owl:sameAs
<https://twitter.com/kidehen#this> ,
<https://www.linkedin.com/in/kidehen#this>
.
} ;

Note: You can use OSDS to invoke the Virtuoso Sponger Instance offered via our URIBurner Service. Net effect, the data above is loaded into a sandboxed Named Graph (identified by the URL of this Medium-hosted post) without using SPARQL 1.1. You achieve this by clicking on the "LOD Cloud" Action icon (magnifying glass in front of globe).

SPARQL Query for Loading Data via URIBurner SPARQL Endpoint (if you choose):

SPARQL
SELECT DISTINCT *
FROM <https://medium.com/virtuoso-blog/using-a-semantic-web-of-linked-data-to-reconcile-disparate-identities-83ab7a315568>
WHERE { ?s ?p ?o } ;

Naturally, you can use OSDS to visualize the RDF-Turtle aspect of the dataset (or body) of the SPARQL 1.1 INSERT statement above, reading it directly from the web page of this Medium post.

Test Query 1

Here we are testing the notion that all subjects of owl:InverseFunctionalProperty relationship type (relation) identify the same entity, if the relationship type object is the same. Basically, that <#kidehen>, <https://twitter.com/kidehen#this>, and <https://www.linkedin.com/in/kidehen#this> all identify the same instance of the foaf:Person class (i.e., me).

SPARQL
DEFINE input:inference ‘urn:ifp:inference:rule’
SELECT DISTINCT
<https://twitter.com/kidehen#this> AS ?s
?p
?o
FROM <urn:kidehen:ifp:demo>
WHERE { <https://twitter.com/kidehen#this> ?p ?o } ;

Test Query 2

Here we are testing the notion that subjects and objects of a owl:sameAs relationship type (relation) identify the same entity. Basically, that <#kidehen>, <https://twitter.com/kidehen#this>, and <https://www.linkedin.com/in/kidehen#this> all identify the same instance of the foaf:Person class (i.e., me).

SPARQL
DEFINE input:same-as “yes”
SELECT DISTINCT
<https://twitter.com/kidehen#this> AS ?s
?p
?o
FROM <urn:kidehen:ifp:demo>
WHERE { <https://twitter.com/kidehen#this> ?p ?o } ;

Using Virtuoso’s Built-In Faceted Browsing Engine

[1] Goto to the Faceted Browsing Service endpoint. Paste in: https://twitter.com/kidehen#this and hit enter

[2] Then click on Settings and select an Inference Rule for owl:InverseFunctionalProperty reasoning. In this case, it would be urn:ifp:inference:rule as depicted below

[3] Then click on the Description tab, which returns the page depicted below.

[4] Click to the Metadata tab, and then click on Coreferences (identifiers that share a common referent) to see the effect of built-in owl:InverseFunctionalProperty Reasoning using the selected Inference Rule. Note: at the current time (due to a quirk in the user interface), you have to click on Permalink to arrive at what’s depicted below.

The screenshots that follow repeat the sequence above, with explicit (rather than implicit or inferred) owl:sameAs built-in Reasoning and Inference.

Artificial Intelligence, Machine Learning, and Cognition

Artificial Intelligence (AI) is a topic of high interest, at the current time. Typically, it is conflated with Machine Learning (ML) which is pattern recognition and “black box algorithm” oriented i.e., it doesn’t leverage Cognition (Reasoning and Inference capability).

Using the example covered in this post, Machine Learning wouldn’t understand anything about the implications of a sentence that had one or more subjects associated with a common object, where the sentence verb identified an owl:InverseFunctionalProperty instance.

Looking at the same example, through the context lenses provided by a Semantic Web of Linked Data, we have an open standard in the from of an OWL Ontology that describes the nature of a specific class (or category) of Relationship Type that enables a machine or human apply cognition to the challenge posed by Identity Reconciliation — an issue that has challenged alternative Data Integration approaches for many years (and counting).

Why is this important?

Data Integration and Data Privacy are two areas the continue to challenge technology in general.

Data Integration is a problem best solved by leveraging cognition and relationship type semantics. That said, this isn’t a one-way street because the save techniques apply to the Dark Web where data fragments culled from Social Media are pieced together into complete profiles.

In addition to the above, compliance with new regulations (e.g., GDPR from the EU) is another are where what’s covered in this post comes into play.

For those that continue to ignore the power and implications of a Semantic Web of Linked Data, I would like to remind you of the fact that the EU has been on the frontier of this technology for years (as a user and source of research and development funding). Thus, GDPR compliance and enforcement isn’t about what’s on paper, it will be enforced cognitively too!

Links

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.