What is the Virtuoso Sponger Middleware about, and why is it important?

Kingsley Uyi Idehen
Apr 3, 2019 · 7 min read

Situation Analysis

The Web continues to grow exponentially across multiple axes. Each axis presents a new set of challenges to unsuspecting users, from the propagation of data silos, to compromises of privacy, to orientation away from the “literary machine” that is this unique global space.

Further, the challenges of the public Web are increasingly seeping into the private domains of organizations on the backs of various computing devices (such as phones, watches, and other IoT components) that make up the emerging Hybrid Cloud Infrastructure.

One solution, applicable to all of these emerging challenges, may be found in the generation of machine- and human-readable document metadata that provides insights into the nature of the content of those documents. Naturally, creating such metadata requires a combination of manual and automated operations performed by both humans and machines.

What is the Virtuoso Sponger?

The Virtuoso Sponger (or simply, the Sponger) is an Extract, Transform, and Load (ETL) Middleware Layer, built into all Virtuoso binaries, that brings metadata generation and content transformation functionality to every Virtuoso deployment — by way of innovative implementation and exploitation of existing open standards.

Sponger Middleware Architecture

The metadata generated by the Sponger always manifests as a Document, comprised of content that itself manifests as a highly explorable Semantic Web of Linked Data, woven together by hyperlinks (specifically, HTTP URIs).

Cartridge Installation

The Sponger is built into all Virtuoso instances, but to take full advantage of its broad collection of transformation drivers, you need to install the Linked Data Cartridges VAD — via the HTTP browser-based Virtuoso Conductor interface or the Operating System-native iSQL command-line interface.

What document types are supported?

The Sponger (or more properly, the Sponger Cartridges — the transformation drivers) operate on HTML, Plain Old Semantic HTML (POSH), (X)HTML+RDFa, HTML5+Microdata, HTML5+JSON-LD, RDF-Turtle, RDF-N-Triples, RDF-N-Quads, RDF/XML, CSV, Atom, RSS, iCal/iCalendar, vCard/vCalendar, Plain Text (with or without Nanotations), and XML and JSON content-types returned by a broad spectrum of APIs.

Benefits?

  • Simplified Semantic Web exploitation — by taking the tedium and confusion out of Linked Data deployment, i.e., generating proxy-hyperlinks that function as Super Keys in conformance with Linked Data (“webby structured data”) principles

How do I use it?

Our URIBurner Service is a live instance of the Virtuoso Sponger that’s been available for free use since 2007, around the time of the DBpedia project launch and commencement of the Linked Open Data (LOD) Cloud.

Given a document of interest that’s available on the Web, at a location identified by a URI, here’s how you would obtain metadata describing said document that manifests as an exploration-friendly Semantic Web of Linked Data:

  1. Go to http://linkeddata.uriburner.com

You can shorten this experience by installing our OpenLink Structured Data Sniffer and OpenLink Data Explorer Browser Extensions — both of which reduce the four-step-process above to a single-click action whenever you seek additional information about the document currently shown in your browser.

You can also manually invoke this functionality with the following URL pattern:

http://linkeddata.uriburner.com/about/html/{document-url} 

Finally, you can invoke the Sponger’s services via the SPARQL Query Service endpoint associated with any Virtuoso instance (including URIBurner) that has this module enabled.

How does it work?

The invisible workflow behind this “deceptively simple” content transformation middleware is as follows:

  1. Negotiate a preferred content-type with target document publisher using HTTP content-negotiation

Live Examples?

URL of Original Web Page: https://www.fastcompany.com/90324660/how-disney-grew-its-3-billion-mickey-mouse-business-by-selling-to-adults

URL of Metadata Document: https://linkeddata.uriburner.com/about/html/https/www.fastcompany.com/90324660/how-disney-grew-its-3-billion-mickey-mouse-business-by-selling-to-adults

The screenshots that follow demonstrate how the Sponger generates a description of the target document in the form of metadata that manifests as a Semantic Web of Linked Data that includes:

  • Document Type — via the property labeled “Type”

Metadata Segment 1

Top-right “Meta-cartridge” drop-down presenting a list of Meta Cartridges used for NLP-based Entity Extraction contributions to the document description production pipeline

Metadata Segment 2

Property values generated through NLP-based Entity Extraction

Metadata Segment 3

Other property values generated through NLP-based Entity Extraction

In addition to the document presented above, each Sponger-generated page includes a link to an alternative document that presents the same metadata in a Faceted-Browsing-oriented form, i.e., a presentation style where filtering and navigation is driven by deeper exploitation of entity relationship type semantics. For instance, you can click on the value of the property labeled “type” to discover related entities from the underlying Virtuoso database associated with the invoked Sponger instance.

Faceted Browsing Segment 1

Description of the source document

Faceted Browsing Segment 2

Description of one of the companies mentioned in the source document

Faceted Browsing Segment 3

Description of another company mentioned in the source document

SPARQL Integration

The Sponger’s services are also available for use as part of Virtuoso’s SPARQL Query Services functionality. For instance, a Document URL functions as an external Data Source Name against which Query Language operations may be performed, declaratively.

Here’s an example of a SPARQL Query that automatically treats a Google Spreadsheet about Intel CPUs as just another structured data source:

DEFINE get:soft "soft"PREFIX cpu: <https://docs.google.com/spreadsheets/d/1NmrGjc8pcgh1S_0mFNABiQpSNjY6Jxm1lAOmcxHaldg/export?format=csv#>
PREFIX dsn: <https://docs.google.com/spreadsheets/d/1NmrGjc8pcgh1S_0mFNABiQpSNjY6Jxm1lAOmcxHaldg/export?format=csv>
SELECT DISTINCT ?s AS ?processorID xsd:string(?model) AS ?modelName ?cores IRI(?amazonUrl)
FROM dsn:
WHERE {
?s cpu:Model ?model ;
cpu:Cores ?cores ;
cpu:Amazon_Link ?amazonUrl .
FILTER (CONTAINS(STR(?amazonUrl),"https:"))
}

The end product is an HTML document (by default; other formats may be requested by various means) equipped with hyperlinks functioning as Super Keys for deeper data exploration and navigation, enabling serendipitous discovery of other related data (locally or across an HTTP-based network like the Web).

Here are some sample live links:

SPARQL Query Results Document

Other Live Examples

Conclusion

The ubiquity of the Web and its profound impact on Internet usability and accessibility is not accidental. It is the product of a well designed solution to data access and integration based on the principle of deceptive simplicity, courtesy of hyperlinks functioning as powerful enablers of data access and data representation.

Our Sponger middleware brings all of the magic of webby structured data representation and access to your existing HTTP, ODBC, JDBC, ADO.NET, OLE-DB, and XMLA compliant applications and services. Basically, every HTTP-accessible document has become a usable structured data source, lying in wait for full exploitation by current and future digital transformation initiatives aimed at optimizing personal and organization-wide agility.

Existing tools for producing Analytics Dashboards, Performance Indicator Reports, and other personal productivity enhancers immediately morph into launch-points for exploring intelligent Knowledge Graphs automatically derived from your own document repositories and databases.

Related

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related Technologies

Kingsley Uyi Idehen

Written by

Founder & CEO, OpenLink Software — provider of Secure, High-Performance, and Cross-Platform Data Access, Integration, Virtualization, and Management Technology.

OpenLink Virtuoso Weblog

News & Articles related to OpenLink Virtuoso & Related Technologies

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade