Using Artificial Intelligence to get from Open Data to Linked Open Data — Part 1

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog
3 min readMar 10, 2017

AI (Artificial Intelligence) is fundamentally about Software Agents (and the machines they drive) being able to perform Reasoning & Inference against data from a variety of sources in varying circumstances.

The city of Boston has recently published a vast collection of Open Data. One of these datasets is about the locations of electric car charging stations.

Challenge

One significant challenge is that the fields (a/k/a attributes, relationship types, or relations) used to construct longitude and latitude data have identifiers that are too tightly scoped to the CSV document in which they were originally published.

Solution

Produce a Linked Open Data rendition of the data without making any changes to the source data and without losing sensitivity to changes to the source over time.

Linked Open Data implications

  • Every entity (including entity relationship types) is identified using a HTTP URI (hyperlink)
  • Entity descriptions take the form of a collection of RDF Language sentences/statements

How?

By producing a set of built-in and/or custom inference rules for an RDF-aware Software Agent (e.g., Virtuoso) that enables it to generate the change-sensitive Linked Open Data rendition.

Note: in this post, built-in inference rules are the key to the solution; custom inference rules will be addressed in part 2, demonstrating an alternative approach.

Steps using Built-In Inference Rules

  1. Create a rule using rdfs:subProperty relations that map the local field names (relations, attributes, properties, predicates) for longitude and latitude to geo:lat and geo:long terms from the Geo Ontology.
  2. Ingest the electric car charging station data using a live instance of the Virtuoso Sponger (Linked Open Data Middleware service).
  3. Produce Linked Open Data using a SPARQL Query (leveraging the Inference Rule pragma) against the electric car charging station dataset.

SPARQL Query with Built-In Inference Rule Enabled

## Context Rule for Built-In Reasoning & Inference 
## See: http://kingsley.idehen.net/DAV/home/kidehen/Public/SPASQL/built-in-inference-rules/geo-data.sql
DEFINE input:inference "urn:geospatial:cleanup:inference:rules"
DEFINE get:soft "soft"
PREFIX geo: <http://www.w3.org/2003/01/geo/wgs84_pos#>SELECT DISTINCT
?s1 AS ?webid
?s3 AS ?latitude
?s2 AS ?longitude
?s4 AS ?name
?s8 AS ?address
?s6 AS ?fuelType
?s5 AS ?city
FROM <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv>WHERE {
?s1 a <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#class> .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Longitude> ?s2 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Latitude> ?s3 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Station_Name> ?s4 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#City> ?s5 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Fuel_Type_Code> ?s6 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Status_Code> ?s7 .
?s1 <http://bostonopendata-boston.opendata.arcgis.com/datasets/465e00f9632145a1ad645a27d27069b4_2.csv#Street_Address> ?s8 .
?s1 geo:lat ?lat1 ;
geo:long ?lng1 .
}

Live SPARQL Query Links

  1. Query Results with Reasoning & Inference Context enabled
  2. Query Results without Reasoning & Inference enabled — basically an empty page.

Screenshots from Faceted Browsing Service #1

Satellite View

Screenshots from Faceted Browsing Service #2

Map View

Screenshot from our HTML5-based PivotViewer

Live Demo Links

Related

--

--

Kingsley Uyi Idehen
OpenLink Virtuoso Weblog

CEO, OpenLink Software —High-Performance Data Centric Technology Providers.