What is the Linked Open Data Cloud, and why is it important?

Situation Analysis

It is the year 2019, and “Data rules the world” has become a global understanding shared by every individual equipped with a computing device.

Today we have an 80 Trillion Dollar Global Economy that is fundamentally driven by Data.

Image for post
Image for post
Global Economy Visualization from howmuch.net

Data is generally understood to be crucial to the creation of Information en route to the discovery of Knowledge.

Unfortunately, there is also a lingering misconception that challenges such as Data’s Volume, Velocity, Variety, Veracity, and Vulnerability can only be solved by some mythic Database Management System that slurps up all of mankind’s data into a single system from which Knowledge is doled out, subject to one’s ability to navigate a morass of obtrusive ads.

This post is an attempt to bring clarity to what the Linked Open Data Cloud (a/k/a the LOD Cloud) actually exemplifies, and how it demonstrates an unrivaled solution to the Data Access problems that we all face today.

What is the LOD Cloud?

The LOD Cloud is a Knowledge Graph that manifests as a Semantic Web of Linked Data. It is the natural product of several ingredients:

The core tapestry of the LOD Cloud arises from adherence to the “deceptively simple” notion that hyperlinks should be used to identify any thing while entityattributevalue or subjectpredicateobject structured sentences should be used to describe every thing.

The practices above constitute what are now commonly known as the principles of Linked Data — a deployment method for representations of structured data that adds the use of hyperlinks (specifically, HTTP URIs) to the EAV (Entity Attribute Value) and RDF (Resource Description Framework) models.

Image for post
Image for post
Bio2RDF and DBpedia — two inter-connected projects that seeded the LOD Cloud

DBpedia and Bio2RDF

Circa 2006, the DBpedia project created a General Knowledge seed for the germination of the LOD Cloud by repurposing Wikipedia content in Linked Data form.

The DBpedia RDF Data Set is hosted and published using OpenLink Virtuoso. The Virtuoso infrastructure provides access to DBpedia’s RDF data via a SPARQL endpoint, alongside HTTP support for any Web client’s standard GET for HTML or RDF representations of DBpedia resources.

Image for post
Image for post
Illustration of Current DBpedia Data Provision Architecture

The Bio2RDF project created a similar, but more focused, seed by generating Linked Data from a variety of Life Science, Healthcare, and Pharmaceutical industry data sources.

Thus, from the get-go, there was an extremely rich mesh of master records that made it easy for others to embrace and extend.

Wikidata

Where DBpedia focuses on generating Linked Open Data from Wikipedia documents, Wikidata focuses on creating Linked Open (meta)Data to supplement Wikipedia documents, so while these may appear at first glance to be competitive projects, they are better treated as complementary.

Schema.org

The Schema.org vocabulary is a relatively new addition to the LOD Cloud. This collection of terms is increasingly understood by search engines — because it is primarily curated by the operators of those same engines, which provides a compelling incentive for its use by content publishers seeking to optimize their Search Engine Results Placement (SERP).

Why is the LOD Cloud Important?

The LOD Cloud provides a loosely-coupled collection of Data, Information, and Knowledge that’s accessible by any human or machine with access to the Internet, courtesy of the abstraction layer provided by the Web. It permits both basic and sophisticated lookup-oriented access using either the SPARQL Query Language or SQL.

Image for post
Image for post
Current LOD Cloud

Economic Challenge of the LOD Cloud

The current cloud came about largely as a side benefit of other projects, through community collaboration strongly influenced by academia and indirectly funded by a variety of research projects. Thus, for all of its existence, a functional business model has been a mercurial pursuit.

Data storage, processing power, network bandwidth, server administration — all of these have costs that must be borne somehow. Content Quality and Query Service Availability are the key user-visible items that challenge the current cloud. None of these is sufficiently addressed by conventional “Open Source” and “Community Collaboration” patterns, i.e., Services-as-Gifts or “Honorable Contributions” aren’t a sustainable option, as time has demonstrated.

Image for post
Image for post
Current LOD Cloud Economy — where the Scheme, Source, and Currency of Compensation are unknowns

Solving the Economic Challenge

Fundamentally, every solution to the LOD Cloud business model challenge boils down to evolving the currently “unknown” elements — Compensation Scheme, Compensation Source, and Compensation Currency — into specifics.

Compensation Scheme

Fine-grained Attribute-based Access Controls (e.g., WebACLs) that describe who (i.e., what person or software agent) has access to what data, and under what conditions. This allows data and query service providers to provide broad but shallow access at no cost, while granting paying users deep and/or focused access at prices appropriate to the net benefit of that access.

Rights Tokenization

An X.509 Digital Certificate can be used to tokenize Identity (in the form of a WebID) and Identification (WebID-Profile) to produce credentials that are reconciled to Web ACLs associated with datasets published to the LOD Cloud by various publishers, via Attribute-Based Access Control (ABAC) systems.

Compensation Currency

Payment options include conventional (fiat) currencies, cryptocurrencies like Bitcoin, cryptocurrencies associated with some Rewards Systems, and others.

Purchase and Usage Process

A user purchases a ticket and stores it in the native key store provided by their operating system (Windows, macOS, Linux, etc.).

Users can also use PKCS#12 files to make their own key stores which reside on their interaction device (laptop, desktop, tablet, phone, etc.) and/or on a detached credentials device (or dongle).

Prototype Solution — URIBurner Service

One example of this system exists today in the form of the LOD Connectivity Licenses that we offer for our URIBurner Service which can be thought of as a “deceptively simple” conduit to the LOD Cloud.

Image for post
Image for post
URIBurner Service and its LOD Connectivity Drivers

In their most basic form, these LOD Connectivity Licenses add SQL access via ODBC, JDBC, ADO.NET, and/or OLE DB to the mix of LOD Cloud data-access protocols (primarily SPARQL and HTTP). URIBurner also brings users an ability to crawl the LOD Cloud as part of the query solution pipeline — using a progressive and intelligent Small Data pattern.

Prototype Solution — VIOS Network

In the VIOS Network, data visualization is added to the mix to increase understanding by way of faceted data browsing or exploration, i.e., entity relationship types (relations) and their aggregate memberships are used to deliver exploration and navigational intelligence.

Image for post
Image for post
VIOS System

Naturally, availability and data accuracy remain important factors in this system, hence the use of Activity Stream Monitors, Cryptocurrencies, Smart Contracts, and Blockchain-based Distributed Ledgers to create a LOD Cloud dimension with incentives for all contributors — publishers, authors, fact-checkers, etc.

Conclusion

In the LOD Cloud, we have a live demonstration of a new frontier for data access, integration, and management, in which each aspect presents a Trillion Dollar market opportunity, applicable to a variety of market segments.

Image for post
Image for post
Conservative Estimates of the LODCloud Market Opportunity

Related

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related…

Kingsley Uyi Idehen

Written by

CEO, OpenLink Software —High-Performance Data Centric Technology Providers. #SHA1 Fingerprint:7ED0CF5F F77BF6214D5FC50EFF9BC354386EB100

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related Technologies

Kingsley Uyi Idehen

Written by

CEO, OpenLink Software —High-Performance Data Centric Technology Providers. #SHA1 Fingerprint:7ED0CF5F F77BF6214D5FC50EFF9BC354386EB100

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related Technologies

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store