Data De-Silo-fication, using combined power of RDF Language & Nanotation

Kingsley Uyi Idehen
OpenLink Software Blog
5 min readSep 23, 2016

Situation Analysis

In today’s world, we are constantly challenged by several vectors of data silo–fication when trying to create, publish, access, or integrate data:

  • Different Document Content Formats
  • Applications hardwired to specific Data Representation Notations and/or Document Content Formats
  • Infrastructure Technology vendor politics and business models
  • Social Media provider politics and business models

The RDF Language (Resource Description Framework), when used in conjunction with HTTP URIs (Hyperlinks), provides a unique and powerful solution for data de-silo-fication. Unfortunately, for a variety of reasons (some self-inflicted), RDF is generally misunderstood, and as a result, its powerful value proposition remains mercurial to many end-users, developers, and data integration practitioners.

What Problem Does RDF Solve?

RDF enables the creation of structured data using sentences. Basically, mimicking what we do every day (irrespective of our individual native tongue) when speaking, writing, or even signing; i.e., RDF enables the use of signs, syntax, and semantics to encode and decode information (data in some context). The underlying model (or schema) of RDF (and other languages) is first-order logic, i.e., the observation-of-fact (data) is an expression of the notion that everything is related to something else, in a variety of ways.

How Can I Experience the Power of RDF, with Ease?

This post is about answering the question posed with a simple live demonstration.

I’ve created the four documents using four different document editing tools, each containing identical RDF Language sentences:

In addition to the above, I’ve employed the powerful structured data sniffing prowess of our multi-browser extension known as the OpenLink Structured Data Sniffer (OSDS). This end-user level tool provides data discovery, extraction, and transformation functionality that highlights what RDF is fundamentally about, i.e., structured data endowed with machine- and human-comprehensible relationship type semantics.

Finally, I use Nanotation which is a “best practice” for embedding RDF Language sentences — wherever text input is accepted — to create a variant of the data included in the documents referenced above. What follows simply adds Medium to the collection of document editors:

{<https://www.linkedin.com/pulse/ai-from-venus-machine-learning-mars-geoffrey-moore#this>
a schema:WebPage, schema:BlogPosting ;
schema:name "AI is from Venus, Machine Learning is from Mars" ;
schema:hasPart <#GeoffreyMooreComment-2016–09–16> ;
schema:url <https://www.linkedin.com/pulse/ai-from-venus-machine-learning-mars-geoffrey-moore> .
<#GeoffreyMooreComment-2016–09–16>
a schema:Comment ;
schema:author <https://www.linkedin.com/in/geoffreyamoore#this> ;
schema:name "GeoffreyMooreComment-2016–09–16" ;
schema:text """AI develops conceptual models of the world that are underpinned by set theory and natural language. In this context, every noun or noun phrase represents a set. Every predicate implicates that set in other sets. If all human beings are mortal, and you are a human being, then you are mortal. It’s an exercise in Venn diagrams. By extending these diagrams through syntax, semantics, and analogy, human beings build up conceptual models of the world that enable us to develop strategies for living. AI seeks to emulate this capability in expert systems.""" ;
schema:about dbpedia:AI, dbpedia:Machine_learning ;
schema:mentions dbpedia:Analytics, dbpedia:Spreadsheet, dbpedia:List_of_reporting_software, dbpedia:Category:Data_analysis_software, dbpedia:Category:Spreadsheet_software ;
schema:relatedLink <http://bit.ly/nanotations> ;
schema:url <> .
}

Here is a visual translation of the RDF Language statements above, which have basically turned this post into an RDF-database, just like that!

Structured Data embedded in this post using Nanotation

Here’s a screenshot showing how I can export this embedded data using JSON-LD (as opposed to RDF-Turtle) which is useful should I want to use this data in some other application that’s hardwired to JSON-LD:

Structured Data embedded in this post presented using JSON-LD notation

Here’s the same data presented using RDF-Turtle Notation:

Here’s what happens with the same structured data from a Microsoft Word document.

Word Document with structured data embedded using Nanotation:

RDF-Language sentence translations and visualization:

Here’s the same thing using Microsoft OneNote.

OneNote Document with structured data embedded using Nanotation:

RDF-Language sentence translations and visualization:

And, using Google Docs.

Google Document with structured data embedded using Nanotation:

RDF-Language sentence translations and visualization:

Finally, using Etherpad.

Etherpad Document with structured data embedded using Nanotation:

RDF-Language sentence translations and visualization:

Conclusion

Data Silos pose the biggest challenges to effective utilization of data. RDF Language is a killer solution for data de-silo-fication.

As you can see from my examples, I can use any text editor to create relational databases (documents containing RDF Language sentences) without the data in question being locked; i.e., I can move my data to wherever I want, on my terms.

Everything stated above applies to any kind of document, and I don’t need to wait for application vendors and developers to figure out the virtues of RDF Language. I can just get on with note-taking whenever the need arises.

Final note, rather than being confused about RDF, take a moment to revisit what this powerful language is about en route to discovering new and interesting frontiers such as:

  • real artificial intelligence — where machines (smart software agents) acquire knowledge that enables reasoning and inference, at massive scales
  • smart machine learning — as opposed to “black box algorithms” that are simply new data silo frontiers

Related

--

--

Kingsley Uyi Idehen
OpenLink Software Blog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.