Reducing Multiple CSVs into a Virtual SQL Table

Map Multiple CSV Documents to a Tabular Relation (Table) in Virtuoso

Daniel Heward-Mills
OpenLink Virtuoso Weblog
4 min readOct 24, 2018

--

Comma Separated Value (CSV) documents are broadly used for storing and exchanging data represented as Tabular Relations (more commonly known simply as TABLEs and VIEWs). Today, a majority of Relational Database Management Systems (RDBMS) support this document type as a vehicle for data export and import.

CSV documents have also become dominant for sharing datasets associated with themes such as Open Data and Big Data. It has emerged as the de facto standard for tabular data interchange.

This post walks you through the process of mapping multiple CSV documents (with identical structure) to a single table in a Virtuoso RDBMS instance, with the ATTACH_FROM_CSV stored procedure.

Prerequisites

  • A running instance of Virtuoso Enterprise Edition, version 8.2+
    (Note: this feature is not available in Virtuoso Open Source Edition)
  • Knowledge of your Virtuoso instance’s SQL Listening Port

Creating Test Data Sets

Create or download the demo data sets below into a directory that is accessible by your Virtuoso instance. To be accessible, directories must be included in the DirsAllowed key-value in the [Parameters] section of the instance configuration document (e.g., virtuoso.ini). The database directory (which typically holds the .db and .ini files) is accessible by default, and additional directories (or folders) may be added if desired.

Contacts1.csv document content

Fname,Sname,Age
John,Smith,48
Anna,Clarks,62
Rojer,Danrette,27
Kate,Sigton,56

Contacts2.csv document content

Fname,Sname,Age
Bob,Wallace,52
Jake,Clarks,62
Bill,Danrette,27
Sue,Sigton,56
Tim,Craft,41

Attaching CSV Documents and Querying Data, via iSQL

Start the iSQL interactive command tool located in the bin directory of your Virtuoso instance, and authenticate if challenged, by executing a command of the form —

./isql {hostname} {SQL port} {username} {pwd} VERBOSE=OFF

— such as —

./isql virt.example.com 1234 myname mypwd VERBOSE=OFF

If you leave {hostname} and/or {port} out of the iSQL launch command, it will target localhost and 1111.

Virtuoso’s SQL port is set viaServerPort in the [Parameters] section of the Virtuoso instance INI file, and defaults to1111.

If you leave {username} out, iSQL will default to use dba. If you include {username} and leave off {password}, iSQL will default to use dba. If the login with these defaults fails (typically because the server password has been changed, as recommended, or because you’ve left it blank, as above), iSQL will prompt for input.

VERBOSE=OFF is an optional launch parameter, which disables the default output of Done xxx msec messages after each command.

Attaching CSV Documents to a Virtual TABLE

After starting iSQL, the two CSV documents can be attached to Virtuoso’s Virtual Database (VDB) Layer using a command like this:

DROP TABLE "demo.csv.contacts" ;ATTACH_FROM_CSV('demo.csv.contacts', vector('csv-demo/contacts/contacts1.csv','csv-demo/contacts/contacts2.csv'), ',', '\n', null, 1, vector(1,2));PRIMARYKEYS "csv_demo.csv.contacts" ;SELECT * FROM "csv_demo.csv.contacts" 
ORDER by age DESC;
PRIMARYKEYS "demo.csv.contacts" ;DROP TABLE "csv_demo.csv.contacts" ;ATTACH_FROM_CSV('csv_demo.csv.contacts',vector('csv-demo/contacts/contacts1.csv','csv-demo/contacts/contacts2.csv'),',','\n',null, 1, vector(1,2));PRIMARYKEYS "csv_demo.csv.contacts" ;-- Ordering Test:SELECT * FROM "csv_demo.csv.contacts"
ORDER by age DESC;

The csv-demo/contacts/... paths can be replaced with paths to files within any directory available to Virtuoso. Absolute paths may be used; relative paths will be resolved based on the Database location.

If the attachment is successful, a fresh SQL> prompt will appear, with no error messages:

Interacting with new Tabular Relations

Using iSQL Command-line Tool

Basic SELECT Query targeting newly created table:

SELECT * FROM "csv_demo"."csv"."contacts"

You can set your query output to use CSV syntax by entering:

SET CSV_RFC4180

Here are the effects on the output of a basic SELECT Query targeting your newly created table:

SELECT * FROM "csv_demo"."csv"."contacts"

Using Virtuoso’s HTML5-based SPASQL (SPARQL and SQL) Query Builder Tool

Tables Listing

Primary Keys Listing

Basic SELECT Query

Conclusion

Virtuoso provides a powerful ability to map multiple CSV documents to a single Tabular Relation. Once mapped, this data is also available for transformation into a powerful Semantic Web of Linked Data via RDF Views that are exploitable via applications and services that access and operate on data courtesy of protocols such as HTTP, ODBC, JDBC, ADO.NET, OLE DB, and XMLA.

Please leave a comment if you found this article useful or have any questions about this feature!

Related Content

--

--