Tableau (Public Edition) as a Launch Point for Semantic Web of Linked Data Exploration

Situation Analysis

In a typical SQL RDBMS, all Keys — whether Candidate, Primary, or Foreign — are scoped to a specific RDBMS instance. Once you enter the realm of distributed databases, these unique identifiers cease to be unique.

What is a Super Key?

A Super Key is an identifier that unambiguously names its referent — the entity it identifies — regardless of the context in which it is encountered. When looked up (dereferenced), a Super Key always resolves to a description of its referent, without any operating system, RDBMS instance, platform, or other dependency.

On an HTTP-based network, whether public or private, hyperlinks (HTTP URIs) can serve as such Super Keys, and provide a powerful mechanism for enhancing Reports and Dashboards produced by Business Intelligence and Analytics oriented productivity tools.

Super Keys and Business Intelligence & Analytics

Reports and Dashboards are fundamentally insight-oriented projections of data in context. Using hyperlinks — Super Keys — as entity names enables every reader to perform additional ad-hoc exploration without the need to create yet another report.

Virtuoso can add these identification and exploration capabilities to existing Business Intelligence or Analytics tools, even if they don’t use a specific data protocol.

The rest of this post demonstrates the use of Virtuoso to inject Super Keys into Tableau Public reports.

Enhancing Tableau Public, with Super Keys from Virtuoso

Prerequisites:

  • A functional installation of Tableau Public
  • Tableau Public Account
  • Access to a public or private SPARQL endpoint

Creating a CSV Document with Virtuoso for Tableau Public

We will use a Virtuoso-generated CSV document for this exercise as ODBC +and JDBC connections are currently not supported by Tableau Public. Navigate to DBPedia’s public SPARQL endpoint and enter:

PREFIX  dct:  <http://purl.org/dc/terms/>
SELECT ?actor as ?ActorID
?school as ?SchoolID
?name as ?ActorLabel
?schoolLabel as ?School
WHERE
{
?actor 
dct:subject <http://dbpedia.org/resource/Category:Tony_Award_winners> ;
<http://dbpedia.org/ontology/almaMater> ?school ;
foaf:name ?name .
?school
rdfs:label ?schoolLabel .
FILTER ( LANG(?name) = "en" )
FILTER ( LANG(?schoolLabel) = "en" )
}

ORDER BY ASC(?school)
LIMIT 100

Select HTML (Faceted Browser Links) in the Results Format dropdown menu, and click Run Query.

Upon resolution, an HTML table will be populated with the SPARQL query’s results. Click on an Actor ID URI (ex: Al Pacino) to confirm that the URIs resolve.

Query Editor: http://tinyurl.com/yatubfkc

Query Results: http://tinyurl.com/ybv7wnh6

Clicking on Al Pacino’s URI will open a page from DBpedia that describes the famous actor using a collection of attribute and value pairings, as depicted below.

Return to the query editor by going back one page in your browser and the change the Results Format to CSV; and then click Run Query.

A CSV document populated with results from the SPARQL query will be saved to your designated downloads folder, or even displayed (subject to your browsers document-type handling capabilities.

CSV Data Displayed in Safari Browser from SPARQL Query Results Page where output format is set to CSV

Tableau Data Usage

Open Tableau Public and click on Text File, and then navigate to your CSV document and click Open.

Tableau will register your CSV document as a data source from which you can create a worksheet. Click on Sheet 1 to begin report creation.

Drag each dimension (relation) of interest to the Rows section of the worksheet setup pane.

Navigate to Worksheet -> Actions

Click on Add Action -> URL…

Use the name text box to select Actor as the target for hyperlink text, and Actor ID as the anchor URI target. Set “Run action on to Select.

Repeat for School and School ID dimensions, then click OK.

Each row item in the Actor and School columns are now anchored by a DBpedia ID (hyperlink) that resolve to an external DBPedia page that describes the entity uniquely identified by whichever hyperlink is clicked.

For example, click on the hyperlink that identifies Al Pacino and it will resolve to his DBPedia page.

If desired, the URI columns can be removed by right-clicking on the Actor ID and School ID buttons in the Rows field, and unchecking Show Header.

The ‘abc’ column can be resized but may create issues with URI resolution.

When you are satisfied with your table results, navigate to File -> Save to Tableau Public.

Enter your login credentials and click Sign In

Provide a literal identifier (label) for your workbook and click Save.

Tableau Public will automatically upload your document and present the final result in your web browser.

Click on the name of any of the individuals listed, and as was the case earlier, your browser will return the DBpedia page for the selected entity.

Public Worksheet: http://tinyurl.com/y9uktscv

Final Report with DBpedia ID (Hyperlinks or HTTP URIs) anchoring Actor and School Labels

Conclusion

In this post I’ve demonstrated how Virtuoso enhances an existing productivity tool (Tableau BI, Public Edition) by injecting Super Keys (in the form of DBpedia identifiers) into SPARQL Query Results — returned in a CSV document. In a nutshell, Tableau Public becomes a powerful productivity tool for non-programmers that provides Semantic Web of Linked Data navigation and visualization.

Most important of all, technology investments that aren’t ODBC or JDBC compliant, but accept CSV data via file upload or from local OS clipboard, can still be used to exploit the power of a Semantic Web of Linked Data without triggering expensive and failure-prone “rip and replace” misadventures.

Related