Semantic Web of Linked Data Market Opportunity Research — using a Dynamic Google Spreadsheet

Published in

OpenLink Virtuoso Weblog

6 min readDec 6, 2017

During my normal data, information, and knowledge management and discovery activities, I regularly stumble across interesting documents from Gartner, Forrester, and the like. Naturally, I want to enrich my knowledge by capturing information from these documents, which I typically then share via private and public data spaces.

For instance, I take notes about Semantic Web Opportunity Sizes from various Market Research documents I read. I save my notes to a Virtuoso Quad Store, from which they are dynamically integrated into a Google Spreadsheet, which always reflects my latest notes. I can easily return to the document that inspired a given note when I review the spreadsheet later.

All of this is possible due to the powerful loose-coupling facilitated by support for open standards — across both Virtuoso (HTTP, SPARQL, and RDF) and Google Spreadsheets (HTTP and CSV).

How is this possible?

Several key productivity tools are necessary prerequisites.

OpenLink Structured Data Sniffer (OSDS) browser extension
dokie.li browser extension, configured with your preferred WebID profile which specifies the preferred storage location for your notes
URIBurner Service (a public Virtuoso instance)
ODS-Briefcase (a public or private Virtuoso instance with WebDAV/LDP base Virtualization module enabled)

Starting with a recent Gartner press release about projected growth for the Application and Integration Middleware (AIM) market segment, I follow a simple sequence of steps:

Read the Press Release Document.
Activate the dokieli extension by clicking on its toolbar icon.
Click the dokieli “Embed Data” control.
Take notes using sentences constructed in an RDF notation. (RDF-Turtle is my personal preference; you can also choose JSON-LD or TriG.)
Use the dokieli "Save As" control to save a copy of the Press Release Document, now including those notes, to an ODS-Briefcase.
Use the URIBurner Service to sponge (extract, transform, and load) the Press Release Document from the ODS-Briefcase. The notes saved in step #5 will now be ingested by the Virtuoso RDBMS instance behind the URIBurner Service.
Create a SPARQL Query scoped to Market Research related data. Select CSV as the Document Type for the SPARQL Query Solution.
Create a cell formula using the importData() function to import the CSV document content into your Google Spreadsheet. This is a one-time task — in the future, simply refreshing the Google Spreadsheet, which retains the the importData() formula binding to the SPARQL Query Results Document, will get the latest Results of that SPARQL Query.

What’s Happening Here?

Steps 3 & 4 result in actual data upload activities; i.e., structured data is added to my various Virtuoso RDBMS instances.

Thanks to the power of hyperlinks (specifically, HTTP URIs), I am able to use a once-created SPARQL Query Results Document URI multiple times — whenever I refresh a Google Spreadsheet that’s used the ImportData() function in a spreadsheet formula to bind to CSV content identified by that URI.

How is this achieved?

The screenshots that follow illustrate my workflow.

[1] Reading Press Release

Gartner Press Release as renedered in my browser

[2] Initialization of dokie.li browser extension — note, clicking on the “Save” button results in the injection of HTML-based structured data islands that enclose your annotations using the <script/> tag.

Annotations incribed using RDF-Turtle Notation, courtesy of dokie.li “Embed Data” Control

[3] Reviewing my notes using OSDS User Interface for RDF Language sentence translation and visualization. At this point, we are still looking at data in the browser (basically, Document Object Model [DOM])which hasn’t been stored anywhere, persistently.

Translation of RDF Language Sentences presented by OSDS

[4] Now, using the dokieli “Save As” function combined with the backend capabilities of my WebDAV/LDP-compliant ODS-Briefcase instance, the following will occur —

Generation of a derivative of the original document that includes provenance metadata and my annotations via structured data islands using <script/>
Persistence of a derivative of the original Gartner Press release, which is protected by an Attribute-based Access Control (an ABAC); i.e., it is conditionally available to entities described in an access rule (note: these rules are described using RDF sentences too!)

Conventional File “Save As” Interaction via dokie.li Browser Extension

View of document rendition saved to my ODS-Briefcase.

ODS-Briefcase UI (based on a conventional Files and Folder metaphor) display my documents

Effect of opening my ODS-Briefcase hosted document rendition

Prologue part of Gartner Press Release rendition [Live Link]

Rest of the Gartner Press Release rendition

Visualization of RDF-Turtle Notation based Annotations added to post via dokie.li “Embed Data” control

RDFa notation based Provenance Metadata injected into Gartner Press Release rendition

My annotations are now persisted in two places, by choice:

ODS-Briefcase (accessible via WebDAV/LDP protocol)
Virtuoso RDBMS (accessible via SPARQL & SQL query languages, over HTTP, ODBC, JDBC, ADO.NET, OLE DB, or XMLA).

[5] I can now define a SPARQL Query focused on Semantic Web related Market Research where I request that the SPARQL Query Solution be delivered as an HTML document.

SPARQL Query Definition Document [Live Link]

HTML-based SPARQL Query Solution Document [Live Link]

Each hyperlink (an HTTP URI) in the SPARQL Query Solution page above identifies a specific Market Segment Size. Thus, courtesy of Virtuoso’s built-in Faceted Browsing capability, clicking an hyperlink of interest results in the delivery of an Entity Description page i.e., a document that’s has the sole purpose of describing the selected entity.

Document that describes a specific Market Segement Size, informed by a collection of attribute name and attribute value pairings

Returning to the original SPARQL Query Results page, invoking the OSDS Browser Extensions SPARQL Query Editor enables editing of my query definition and/or query solution document types, within my browser.

Based on my desire to present my data in a Google Spreadsheet, I can use the query solution URL paremeters form provided by OSDS to designate CSV as my desired Document Type (format = text/csv) .

[6] I now have a CSV-based Query Solution document’s URL which can be used as an argument to the importData() function, for integration into my Google Spreadsheet.

CSV Document based SPARQL Query Solution [Live Link]

Formula that leverages CSV-based SPARQL Query Solution Document URL and `ImportData()` function

Semantic Web Market Opportunity Size related Market Research Data Fully Integrated into Google Spreadsheet [Live Link]

Here’s an embedded version of the final Google Spreadsheet i.e., what’s depicted above.

Semantic Web Market Opportunity Size Estimates

Via SPARQL Query( Latest) marketSegmentID, marketSegmentLabel, size, currency, rounded, marketSegmentDescriptionDocID…

docs.google.com

Conclusion

Thanks to Virtuoso and Google Spreadsheet support for open standards — HTTP, SPARQL, RDF, CSV, and others — I am able to easily create and maintain a Google Spreadsheet that dynamically tracks Market Research on Semantic Web Opportunity Sizes, using notes I created during or after the act of reading various documents.

I enrich my own knowledge while performing analysis in my spreadsheet, without any data getting out of sync.

Tools Used

Gartner Press Release — Applications and Integration Middleware Market Size
Virtuoso Multi-Model RDBMS & Data Virtualization Platform
OpenLink Structured Data Sniffer — Browser Extension
dokie.li — Web Document Annotation Tool