Integrating SPARQL Query Results from Virtuoso into a Neo4j Property Graph

Exploiting the combined power of SPARQL, SQL, and Cypher using Neo4j and OpenLink Virtuoso.

Daniel Heward-Mills
OpenLink Virtuoso Weblog
5 min readFeb 23, 2021

--

Structural differences associated with data represented as Relational Tables, RDF-based Entity Relationship Graphs, and Labeled Property Graphs permeate the data access, integration, and management landscape to date. Such heterogeneity is a fact-of-life for every practitioner in any of the many data-related fields.

In this post, I will demonstrate how the combined interoperability features of two database management systems (DBMS) can deliver powerful data access and data virtualization benefits by leveraging the bridging power of existing open standards.

To achieve this goal, I will be accessing data from the Linked Open Data (“LOD”) Cloud using SPARQL Query Results from OpenLink Virtuoso (“Virtuoso”) piped directly to Neo4j via the following steps:

  • Create a Federated SPARQL query using Virtuoso
  • Use Virtuoso’s SPARQL-within-SQL (SPASQL) functionality to create a SQL View — which also exposes Hyperlinks functioning as Entity Identifiers that enable cross-platform, follow-your-nose exploration with any HTTP-aware tool (such as a Web Browser)
  • Access the SQL View created above from Neo4j via a JDBC connection
  • Use a combination of Cypher MERGE and CREATE operations to derive a Labeled Property Graph (LPG) to the SQL View

Prerequisites

JDBC Connectivity and Neo4j

We use the OpenLink JDBC-ODBC bridge driver as the key middleware layer between Neo4j’s JDBC Connectivity and Virtuoso’s ODBC-accessible DBMS — which also includes built-in SPARQL Connectivity.

As depicted below, the JDBC-ODBC Bridge Driver provides powerful data harmonization between the distinct DBMS realms.

To set this up, perform the following:

  • Copy the JDBC-ODBC Bridge opljdbc4_2.jar file from your working JDBC-ODBC Bridge installation, to your Neo4j instance’s lib directory
  • Start or restart your Neo4j instance, and the driver class will be useable.

Note, The “lib” directory location can be located via the following steps, using the Terminal view option from Neo4j:

Go to the “lib” folder using pwd on macOS/Linux/Powershell, and “DIR” via the Windows Command-Line to confirm location.

SPARQL-FED Query

Federated SPARQL (SPARQL-FED) enables users to invoke and return data from remote data sources. This example uses SPARQL-FED to combine data from both DBpedia and Wikidata hosted SPARQL endpoints, into a single result set on OpenLink’s demo server.

The data to be operated on using Cypher originates from the following SPARQL-FED query that combines information about Robert Downey Jr., across DBpedia and Wikidata:

To minimize the amount of code surfaced in Neo4j, while also minimizing the raw SPARQL exposure for an audience more familiar with SQL, the SPARQL query can be converted into SQL View, courtesy of Virtuoso’s ability fuse SPARQL and SQL using its SPASQL functionality. Once created, simply grant SELECT privileges to the Virtuoso SQL User or Role Account to be associated with subsequent JDBC requests e.g., “vdb” .

SQL View Generation Script

Binding SPARQL Query Results to Neo4j

Tabular Query Solutions returned by the SQL View set can now be bound and then transformed into a Labeled Property Graph using Cypher, while retaining the dual benefits of Hyperlinks as conduits to RDF-based Entity Relationship Graphs and the cross-platform follow-your-nose graph-exploration pattern.

Using Cypher

Cypher’s MERGE operations map and create nodes from the SQL Query Solutions delivered via JDBC, resulting in a Labeled Property Graph variant of Entity Relationship Graph.

Cypher’s CREATE statements bind the respective DBpedia and Wikidata URIs, delivered via the SQL View, and their associated time intervals (i.e., Relationship Start and End Dates) by adding their returned SQL Query Solution values as respective node and relationship attributes.

The result of this binding method is a “deceptively simple” fusion of two Entity Relationship Graph variants i.e., a Neo4j Labeled Property Graph and an RDF-based Graph — where RDF’s Subject-Predicate-Object (Entity-Attribute-Value) structure is mapped to the relevant Attributes of each Node and associated Relationship, while also preserving the important role of Hyperlinks as Subject, Predicate, and Object identifiers.

Snippet Illustrating LPG and RDF Data Fusion

Here are some steps for better understanding what’s going on:

  1. Clicking on the RDJ node’s URI attribute value will resolve to the DBpedia document about Robert Downey Jr.

2. The sameAs value will resolve to the Wikidata document about RDJ.

3. Relationships in the property graph also have their DBpedia and Wikidata URIs included as properties

Clicking on the URI for the SPOUSE relationship opens up the resource for the spouse property described by the DBpedia ontology.

Conclusion

We have successfully constructed a powerful data access and transformation exercise comprising a SPARQL SELECT Query Solution, via a SQL View, to a Neo4j Labeled Property Graph using existing open standards (SQL, SPARQL, ODBC, and JDBC).

This was made possible by leveraging Virtuoso as combined middleware and DBMS layer for serving Tabular Data to Neo4j using a fusion of SPARQL and SQL.

The solution covered in this article applies to any application of service that can consume data using the ODBC or JDBC open standards.

Related Content

--

--