What is the Virtuoso Sponger Middleware about, and why is it important?
The Web continues to grow exponentially across multiple axes. Each axis presents a new set of challenges to unsuspecting users, from the propagation of data silos, to compromises of privacy, to orientation away from the “literary machine” that is this unique global space.
Further, the challenges of the public Web are increasingly seeping into the private domains of organizations on the backs of various computing devices (such as phones, watches, and other IoT components) that make up the emerging Hybrid Cloud Infrastructure.
One solution, applicable to all of these emerging challenges, may be found in the generation of machine- and human-readable document metadata that provides insights into the nature of the content of those documents. Naturally, creating such metadata requires a combination of manual and automated operations performed by both humans and machines.
What is the Virtuoso Sponger?
The Virtuoso Sponger (or simply, the Sponger) is an Extract, Transform, and Load (ETL) Middleware Layer, built into all Virtuoso binaries, that brings metadata generation and content transformation functionality to every Virtuoso deployment — by way of innovative implementation and exploitation of existing open standards.
The metadata generated by the Sponger always manifests as a Document, comprised of content that itself manifests as a highly explorable Semantic Web of Linked Data, woven together by hyperlinks (specifically, HTTP URIs).
The Sponger is built into all Virtuoso instances, but to take full advantage of its broad collection of transformation drivers, you need to install the Linked Data Cartridges VAD — via the HTTP browser-based Virtuoso Conductor interface or the Operating System-native iSQL command-line interface.
What document types are supported?
The Sponger (or more properly, the Sponger Cartridges — the transformation drivers) operate on HTML, Plain Old Semantic HTML (POSH), (X)HTML+RDFa, HTML5+Microdata, HTML5+JSON-LD, RDF-Turtle, RDF-N-Triples, RDF-N-Quads, RDF/XML, CSV, Atom, RSS, iCal/iCalendar, vCard/vCalendar, Plain Text (with or without Nanotations), and XML and JSON content-types returned by a broad spectrum of APIs.
- Simplified Semantic Web exploitation — by taking the tedium and confusion out of Linked Data deployment, i.e., generating proxy-hyperlinks that function as Super Keys in conformance with Linked Data (“webby structured data”) principles
- Ease of Use — courtesy of hyperlinks as the sole control mechanism for delivering its powerful document description and content transformation
- Ease of Extensibility — cartridges (drivers or connectors) for specific document types are leveraged as the delivery mechanism for content transformation functionality
- Broad Integration with Third Party Services (client and server ends) — more than 70 API and Document Type combinations are currently supported, with customization APIs available for additional enhancements
- Powerful Meshing (rather than mashing) of content from disparate data sources — i.e., a powerful solution for Data Virtualization
How do I use it?
Our URIBurner Service is a live instance of the Virtuoso Sponger that’s been available for free use since 2007, around the time of the DBpedia project launch and commencement of the Linked Open Data (LOD) Cloud.
Given a document of interest that’s available on the Web, at a location identified by a URI, here’s how you would obtain metadata describing said document that manifests as an exploration-friendly Semantic Web of Linked Data:
- Go to http://linkeddata.uriburner.com
- Place the URL that identifies the document of interest into the input field labeled “Enter the URL to sponge:”
- Click on the “Sponge” button
- View the doc returned to your browser
You can shorten this experience by installing our OpenLink Structured Data Sniffer and OpenLink Data Explorer Browser Extensions — both of which reduce the four-step-process above to a single-click action whenever you seek additional information about the document currently shown in your browser.
You can also manually invoke this functionality with the following URL pattern:
Finally, you can invoke the Sponger’s services via the SPARQL Query Service endpoint associated with any Virtuoso instance (including URIBurner) that has this module enabled.
How does it work?
The invisible workflow behind this “deceptively simple” content transformation middleware is as follows:
- Negotiate a preferred content-type with target document publisher using HTTP content-negotiation
- Scan through the list of configured Extractor Cartridges (based on paths or content-type settings) and apply transformations offered by each of the relevant cartridges — at this point phase 1 completes with transformed content ready for the next processing cycle
- Scan through a list of configured Meta Cartridges that perform tasks like Natural Language Processing (NLP)-based Entity Extraction and Linked Open Data (LOD) Cloud Lookups against the content produced by phase 1
- Present the transformed document to the User Agent (e.g. Browser) that invoked the Sponger
The screenshots that follow demonstrate how the Sponger generates a description of the target document in the form of metadata that manifests as a Semantic Web of Linked Data that includes:
- Document Type — via the property labeled “Type”
- Document Title and Description — via the properties labeled “title” and “description”
- Document Focus (a/k/a Primary Topic) — via the property labeled “about”
- Related Topics — via the properties labeled “seeAlso” and “has related”
- Entities Mentioned — via the property labeled “mentions”
Metadata Segment 1
Metadata Segment 2
Metadata Segment 3
In addition to the document presented above, each Sponger-generated page includes a link to an alternative document that presents the same metadata in a Faceted-Browsing-oriented form, i.e., a presentation style where filtering and navigation is driven by deeper exploitation of entity relationship type semantics. For instance, you can click on the value of the property labeled “type” to discover related entities from the underlying Virtuoso database associated with the invoked Sponger instance.
Faceted Browsing Segment 1
Faceted Browsing Segment 2
Faceted Browsing Segment 3
The Sponger’s services are also available for use as part of Virtuoso’s SPARQL Query Services functionality. For instance, a Document URL functions as an external Data Source Name against which Query Language operations may be performed, declaratively.
Here’s an example of a SPARQL Query that automatically treats a Google Spreadsheet about Intel CPUs as just another structured data source:
DEFINE get:soft "soft"PREFIX cpu: <https://docs.google.com/spreadsheets/d/1NmrGjc8pcgh1S_0mFNABiQpSNjY6Jxm1lAOmcxHaldg/export?format=csv#>
PREFIX dsn: <https://docs.google.com/spreadsheets/d/1NmrGjc8pcgh1S_0mFNABiQpSNjY6Jxm1lAOmcxHaldg/export?format=csv>SELECT DISTINCT ?s AS ?processorID xsd:string(?model) AS ?modelName ?cores IRI(?amazonUrl)
?s cpu:Model ?model ;
cpu:Cores ?cores ;
cpu:Amazon_Link ?amazonUrl .
The end product is an HTML document (by default; other formats may be requested by various means) equipped with hyperlinks functioning as Super Keys for deeper data exploration and navigation, enabling serendipitous discovery of other related data (locally or across an HTTP-based network like the Web).
Here are some sample live links:
Other Live Examples
- Looking at a StackOverflow post about DBpedia SPARQL Endpoint — Basic Metadata Document or Faceted-Browsing-oriented Metadata Document (which provides pathways to other questions and answers related information in the underlying Virtuoso RDBMS instance)
- Looking at a Forbes article about venture capital firm Andreessen Horowitz — Basic Metadata Document or Faceted Browsing oriented Metadata Document
- Looking at an HTML document about Unix Philosophy and its impact on computing today — Basic Metadata Document or Faceted Browsing oriented Metadata Document
- Looking at a Google Spreadsheet about Intel CPUs portfolio — Basic Metadata Document or Faceted Browsing oriented Metadata Document
- Looking at an Ontology (Data Dictionary) generated from Google Spreadsheet content — Faceted Browsing oriented Metadata Document
- Looking at a Description of an Individual or Instance — associated with one of the Entity Types (Classes) from the generated Ontology
The ubiquity of the Web and its profound impact on Internet usability and accessibility is not accidental. It is the product of a well designed solution to data access and integration based on the principle of deceptive simplicity, courtesy of hyperlinks functioning as powerful enablers of data access and data representation.
Our Sponger middleware brings all of the magic of webby structured data representation and access to your existing HTTP, ODBC, JDBC, ADO.NET, OLE-DB, and XMLA compliant applications and services. Basically, every HTTP-accessible document has become a usable structured data source, lying in wait for full exploitation by current and future digital transformation initiatives aimed at optimizing personal and organization-wide agility.
Existing tools for producing Analytics Dashboards, Performance Indicator Reports, and other personal productivity enhancers immediately morph into launch-points for exploring intelligent Knowledge Graphs automatically derived from your own document repositories and databases.
- What is a Virtuoso SPARQL Endpoint, and why is it important?
- What is a SPARQL Endpoint, and why is it important?
- How Do You Know You Need a Multi-Model RDBMS?
- Virtuoso Home Page
- Free Virtuoso Evaluation and Download Page — for Windows, Linux, and macOS
- Free Virtuoso Evaluation License for Windows
- Free Virtuoso Evaluation License for Linux
- Free Virtuoso Evaluation License of macOS
- Current Entry-Level License Offers for Virtuoso across Linux, Windows, and macOS
- Pay-As-You-Go (PAGO) Virtuoso AMI in the Amazon Web Services (AWS) Cloud
- Virtuoso 8.x (Closed Source) Docker Container
- Virtuoso 7.x (Open Source) Docker Container