Reducing Multiple CSVs into a Virtual SQL Table
Map Multiple CSV Documents to a Tabular Relation (Table) in Virtuoso
Comma Separated Value (CSV) documents are broadly used for storing and exchanging data represented as Tabular Relations (more commonly known simply as TABLEs and VIEWs). Today, a majority of Relational Database Management Systems (RDBMS) support this document type as a vehicle for data export and import.
CSV documents have also become dominant for sharing datasets associated with themes such as Open Data and Big Data. It has emerged as the de facto standard for tabular data interchange.
This post walks you through the process of mapping multiple CSV documents (with identical structure) to a single table in a Virtuoso RDBMS instance, with the ATTACH_FROM_CSV
stored procedure.
Prerequisites
- A running instance of Virtuoso Enterprise Edition, version 8.2+
(Note: this feature is not available in Virtuoso Open Source Edition) - Knowledge of your Virtuoso instance’s SQL Listening Port
Creating Test Data Sets
Create or download the demo data sets below into a directory that is accessible by your Virtuoso instance. To be accessible, directories must be included in the DirsAllowed
key-value in the [Parameters]
section of the instance configuration document (e.g., virtuoso.ini
). The database
directory (which typically holds the .db
and .ini
files) is accessible by default, and additional directories (or folders) may be added if desired.
Contacts1.csv
document content
Fname,Sname,Age
John,Smith,48
Anna,Clarks,62
Rojer,Danrette,27
Kate,Sigton,56
Contacts2.csv
document content
Fname,Sname,Age
Bob,Wallace,52
Jake,Clarks,62
Bill,Danrette,27
Sue,Sigton,56
Tim,Craft,41
Attaching CSV Documents and Querying Data, via iSQL
Start the iSQL
interactive command tool located in the bin
directory of your Virtuoso instance, and authenticate if challenged, by executing a command of the form —
./isql {hostname} {SQL port} {username} {pwd} VERBOSE=OFF
— such as —
./isql virt.example.com 1234 myname mypwd VERBOSE=OFF
If you leave {hostname}
and/or {port}
out of the iSQL launch command, it will target localhost
and 1111
.
Virtuoso’s SQL port is set viaServerPort
in the [Parameters]
section of the Virtuoso instance INI file, and defaults to1111
.
If you leave {username}
out, iSQL will default to use dba
. If you include {username}
and leave off {password}
, iSQL will default to use dba
. If the login with these defaults fails (typically because the server password has been changed, as recommended, or because you’ve left it blank, as above), iSQL will prompt for input.
VERBOSE=OFF
is an optional launch parameter, which disables the default output of Done xxx msec
messages after each command.
Attaching CSV Documents to a Virtual TABLE
After starting iSQL, the two CSV documents can be attached to Virtuoso’s Virtual Database (VDB) Layer using a command like this:
DROP TABLE "demo.csv.contacts" ;ATTACH_FROM_CSV('demo.csv.contacts', vector('csv-demo/contacts/contacts1.csv','csv-demo/contacts/contacts2.csv'), ',', '\n', null, 1, vector(1,2));PRIMARYKEYS "csv_demo.csv.contacts" ;SELECT * FROM "csv_demo.csv.contacts"
ORDER by age DESC;PRIMARYKEYS "demo.csv.contacts" ;DROP TABLE "csv_demo.csv.contacts" ;ATTACH_FROM_CSV('csv_demo.csv.contacts',vector('csv-demo/contacts/contacts1.csv','csv-demo/contacts/contacts2.csv'),',','\n',null, 1, vector(1,2));PRIMARYKEYS "csv_demo.csv.contacts" ;-- Ordering Test:SELECT * FROM "csv_demo.csv.contacts"
ORDER by age DESC;
The csv-demo/contacts/...
paths can be replaced with paths to files within any directory available to Virtuoso. Absolute paths may be used; relative paths will be resolved based on the Database location.
If the attachment is successful, a fresh SQL>
prompt will appear, with no error messages:
Interacting with new Tabular Relations
Using iSQL Command-line Tool
Basic SELECT Query targeting newly created table:
SELECT * FROM "csv_demo"."csv"."contacts"
You can set your query output to use CSV syntax by entering:
SET CSV_RFC4180
Here are the effects on the output of a basic SELECT Query targeting your newly created table:
SELECT * FROM "csv_demo"."csv"."contacts"
Using Virtuoso’s HTML5-based SPASQL (SPARQL and SQL) Query Builder Tool
Tables Listing
Primary Keys Listing
Basic SELECT Query
Conclusion
Virtuoso provides a powerful ability to map multiple CSV documents to a single Tabular Relation. Once mapped, this data is also available for transformation into a powerful Semantic Web of Linked Data via RDF Views that are exploitable via applications and services that access and operate on data courtesy of protocols such as HTTP, ODBC, JDBC, ADO.NET, OLE DB, and XMLA.
Please leave a comment if you found this article useful or have any questions about this feature!
Related Content
- Generating Linked Data Views From SQL Relational Data
- Getting from Open Data to a Semantic Web of Linked Data, with Virtuoso
- Conceptual Relational Data Virtualization, using Existing Open Standards