Five Simple Steps to Experience the Power of a Knowledge Graph, using Virtuoso
For individuals and organizations alike, computing remains challenged by the pervasiveness of data silos, something I’ve written about in various posts over the years. Common to all those posts is my use of Virtuoso (our multi-model RDBMS, Data Access Middleware, and Data Virtualization platform) to demonstrate how data de-silo-fication can be achieved in an unobtrusive manner; i.e., you don’t have to “rip & replace” existing infrastructure in order to take advantage of what a Knowledge Graph has to offer.
Creating and Exploiting a Knowledge Graph in Five Simple Steps
Note: You can skip steps 1 through 4, if you choose to use the live Virtuoso instance behind our public URIBurner Service.
- Obtain and Install Virtuoso — for on-premise use on Windows, macOS, or Linux; Docker Container, preconfigured Amazon EC2 AMI in the AWS Cloud, or preconfigured Virtual Machine in the Microsoft Azure Cloud.
- Start the Virtuoso Server — by following the documentation for the Windows or macOS native UI; or by using the Linux command-line,
virtuoso -c virtuoso.ini
. - Perform basic Virtuoso configuration — primarily, by changing the default passwords for the
dba
anddav
“super-user” accounts using the HTML-based Administrator (the Conductor) or the SQL command-line - Install Virtuoso productivity modules — e.g., the Extract, Load, and Transform Middleware for RDF-based Linked Open Data (a/k/a the Sponger) and the Faceted Browser — via their respective Virtuoso Application Distribution archives (VADs)
- Install one or more of our browser extensions — such as the OpenLink Structured Data Sniffer (OSDS), the OpenLink Structured Data Editor (OSDE), and/or the OpenLink Data Explorer (ODE)— which turn Safari, IE, Firefox, Chrome, Opera and related browsers into Linked-Data-aware user agents.
You are now ready to commence exploitation of Virtuoso-powered data access, integration, and management!
Knowledge Graph Exploitation
Whether deployed to laptop, desktop workstation, or server, Virtuoso’s built-in “Sponger” middleware provides ETL (Extract, Transform, and Load) services that can analyze document content (from a variety of data source types) and then generate descriptions of said documents in 5-Star Linked Open Data form (i.e., web-like structured data constructed using RDF-language-based digital sentences/statements). RDF documents generated by the Sponger are available in a wide variety of document types including RDF-Turtle, JSON-LD, HTML5+Microdata, HTML+RDFa, RDF/JSON, RDF-XML, CSV, OData/Atom, and OData/JSON.
Entity Description Page Returned by the URIBurner ETL Service
Installation of the Sponger ETL Middleware module equips your Virtuoso instance with a live service endpoint identified by the URL pattern, http://{your-instance-cname}/sponger/
. This endpoint resolves to an HTML page that includes an input field for capturing URLs that identify the documents against which you seek to perform ETL operations.
If you prefer, rather than working through the form, you can construct sponger service URLs by hand, using the pattern: http://{your-instance-cname}/about/html/{document-url}
. For instance, if using our URIBurner instance to sponge the DBpedia page about DBpedia itself, this translates to http://linkeddata.uriburner.com/about/html/http://dbpedia.org/resource/DBpedia
.
Simplest of all, you can leverage the functionality provided by our OpenLink Data Explorer (ODE) or OpenLink Structured Data (OSDS) browser extensions to immediately invoke analysis of the page currently in focus (i.e., what’s currently displayed in your browser’s address bar), simply by clicking the ODE or OSDS toolbar icon.
Live Examples
The following live examples are based on our URIBurner Service, a publicly accessible instance of the Virtuoso ETL middleware. Every example based on this service can also be experienced through your own public or private Virtuoso instance.
Data Source URLs
http://www.wired.co.uk/article/the-webs-greatest-minds-on-how-to-fix-it
https://www.wired.com/story/the-decentralized-internet-is-here-with-some-glitches/
Sponger ETL Service URL Examples
Basic entity description pages —
http://linkeddata.uriburner.com/about/html/http://www.wired.co.uk/article/the-webs-greatest-minds-on-how-to-fix-it
http://linkeddata.uriburner.com/about/html/https://www.wired.com/story/the-decentralized-internet-is-here-with-some-glitches/
Deeper follow-your-nose pages, for deeper exploration and serendipitous discovery of additional relevant information —
http://linkeddata.uriburner.com/describe/?uri=http%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttp%3A%2F%2Fwww.wired.co.uk%2Farticle%2Fthe-webs-greatest-minds-on-how-to-fix-it
http://linkeddata.uriburner.com/describe/?uri=http%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttps%3A%2F%2Fwww.wired.com%2Fstory%2Fthe-decentralized-internet-is-here-with-some-glitches%2F
What’s Happening Here?
A Knowledge Graph comprises RDF sentence collections that describe any number of things. In this case, we are describing documents which would naturally include describing the topics covered by said documents.
To effectively describe anything using sentences, we must have an identification mechanism in place that facilitates how we identify (denote) the subject, predicate, and object of each sentence. This is where hyperlinks (specifically, HTTP URIs) come into powerful use, enabling us to easily look-up what entities are identified by the sentence subject, predicate, and object.
The “deceptively simple” act of constructing sentences using hyperlinks is what produces a Knowledge Graph deployed using Linked Data principles. This is also referred to as an Entity Relationship Graph when visualized using a Graphical Notation (as opposed to a Linear Notation). Naturally, the more sentences you collate, the deeper your Web becomes.
Virtuoso can handle all of this for you. As stated earlier, no canned data is required, because Virtuoso starts generating and storing useful data (entity relationships expressed in RDF sentences) the moment you direct it to describe a document, whether published to the external Web or to an internal Enterprise Intranet.
What follows are some additional details about the kinds of Linked Data documents that Virtuoso can generate.
“Basic” Entity Description Pages
As with the live DBpedia and DBpedia-Live instance deployments of Virtuoso, we refer to this kind of page as being “basic” only because its link traversal (i.e., HTTP URI lookup or de-reference) doesn’t include deep expansion of class instances (i.e., rdf:type
attribute/property [relation] values).
The annotated diagram below depicts an entity description document (as generated by the Sponger) using EAV (Entity, Attribute, Value) terminology.
The annotated diagram below presents the same entity description document (again, as generated by the Sponger), this time using SPO (Subject, Predicate, Object) terminology.
“Deeper” Linked Data Follow-Your-Nose Entity Description Pages
This type of page is “deeper” (relative to the “basic”) simply because its link traversal (i.e., its HTTP URI lookups or de-references) does include deep expansion of rdf:type
attribute/property (relation) values.
The annotated diagram below depicts an entity description document (as generated by the Sponger) using EAV (Entity, Attribute, Value) terminology.
The annotated diagram below depicts the same entity description document (again as generated by the Sponger) using SPO (Subject, Predicate, Object) terminology.
In addition to Faceted Browsing for Knowledge Graph exploration, you also have the ability to use SPARQL to generate alternative exploration starting points.
Knowledge Graph Interaction using SPARQL
Now that you’ve populated your Virtuoso instance via a dynamic ETL processing pipeline, you can peform intelligent operations using the SPARQL and/or SQL query languages.
This SPARQL Query produces a dynamically-generated HTML document that provides an index of hyperlinks that denote Sample Entities grouped by Entity Type.
SELECT DISTINCT SAMPLE (?s) AS ?sample COUNT(*) AS ?count ?EntityType
FROM <http://www.wired.co.uk/article/the-webs-greatest-minds-on-how-to-fix-it>
WHERE {?s a ?EntityType}
GROUP BY ?EntityType
ORDER BY DESC 2
Virtuoso’s Unique Architecture
Data de-silo-fication is the fundamental value proposition of Virtuoso. Upon installation of Virtuoso (whether on your local desktop or a remote server), you are immediately equipped with a data-junction-box that enables conceptual virtualization of heterogeneous data.
Because of this immediate empowerment, there are no “canned demos” based on unrealistic “canned data”. The moment you install Virtuoso with its Sponger Middleware and Faceted Browsing modules enabled, you are ready for a different kind of experience with data, one that has been expected for a long time but never delivered in such an unobtrusive manner.
Data Virtualization
Virtuoso’s conceptual harmonization of heterogeneous data is also known as “data virtualization.” Virtuoso delivers this virtualization so well because it is a full-blown relational database management system (RDBMS) that’s been equipped with a built-in data virtualization engine that supports a wide variety of open-standards-based protocols.
Broad support of open standards ensures data virtualization doesn’t come at the cost of product lock-in. We want our customers and prospects to consider Virtuoso only because it’s the “best of class” solution in its problem space, and not because of proprietary language or other lock-in.
Conclusion
Historically, the shortage of of high-level productivity tools and the general unawareness of the existence of any such tools have compounded confusion that inadvertently swirls around the notion and utility of a Knowledge Graph.
As demonstrated here, Virtuoso, as both a productivity tool and highly sophisticated platform that scales from tiny client setups to high-end server deployments, puts to rest these distractions of yore by enabling any end-user, power-user, systems-integration professional, enterprise architect, willing executive, or developer to fully exploit the power and promise of a Semantic Web, by leveraging a simple software installation that doesn’t require anyone to write a single line of code!
Related
- DBpedia Usage Report — another Live Virtuoso RDBMS instance that sits at the core of the Linked Open Data (LOD) Cloud
- LOD Cloud — massive collection of openly accessible Databases (Virtuoso RDBMS instances dominate this cloud, too!)
- Preloaded & Preconfigured Virtuoso Instances from Amazon EC2 Cloud
- Interactive Virtuoso Product Architecture Diagram
- Semantic Web Layer Cake Revisited
- Virtuoso’s Web & File Server Functionality — for mounting and integrating disparate data storage services
- Generating a Semantic Web of Linked Data from Open Data
- Configuring an ODBC DSN for the World Wide Web RDBMS
- URIBurner Service
- OpenLink Data Explorer Browser Extensions
- Virtuoso Home Page
- Virtuoso Server Binary Downloads via basic HTML page — for Windows, macOS, and Linux
- Virtuoso Download & License Generation Service — free evaluation licenses and installer archives for Windows, macOS, and Linux
- Virtuoso Docker Containers
- Current Virtuoso License Offers