Adding SQL Interface to a non-relational database

Published in

ViyaDB

4 min readNov 6, 2017

When it’s the right time for adding SQL interface to a database, which wasn’t designed as a relational database? Let’s take a step back, and ask whether SQL interface is needed at all. Well, even though the original JSON-based query language looks pretty intuitive, the following reasons for adding basic SQL support to ViyaDB are obvious:

It’s crucial for on-boarding since SQL is like English in databases world — everybody speaks it.
Integration with data exploration tools that support SQL out of the box becomes simpler.

BTW, if it’s the first time that you hear about this database, please refer to my previous article to learn more about its purposes.

Goals (and Non-Goals)

Implementing a subset of SQL with only DML support (no DDL at first) is a good start. There’s no goal of adhering to some SQL standard, especially because the database is not a regular relational database. For example, since all the dimensions and metrics are part of the schema, and since ViyaDB pre-aggregates metrics it’s worthless writing GROUP BY in a SELECT query, so we can just write:

SELECT app_id, count
FROM activity 
WHERE event_time BETWEEN '2015-01-01' AND '2015-01-14'

JOINS are not supported by the database, therefore there’s no need for implementing this syntax as well.

Implementation

Since we have JSON-based API already, the idea was to create a standalone module that translates SQL into JSON representation, like it’s shown in the following diagram:

The advantage of having a standalone SQL parser module is that it can be reused in both single node and clustered environments. In a clustered environment, the standalone SQL module can be part of a query facade, which will use the internal JSON query representation for building execution plan on querying remote workers and pushing down predicates.

There are not many C++ parser generators to choose from. Flex/Bison pack is one of the most popular ones, so the choice was made in favor of these frameworks. The learning curve for writing parsers is quite long, fortunately, there are projects like this, which help bootstrap a parser project quite instantly. Basically, what you need is writing two files: a scanner file and a grammar file. The first one defines how SQL text breaks down into tokens, while the second one uses these tokens to define supported SQL grammar. The final step is running Flex and Bison on these files respectively, what generates the parser source code.

Example

Let’s run a ViyaDB Docker instance, and load some data into it ( previous article contains all the details on how to do that). The only distinction, is that we are going to name the database container in order to be able to link with it:

docker run -p 5000 : 5000 --name viyadb --rm -ti \
  -v $(pwd):/tmp/viyadb viyadb/viyadb:latest

Now, let’s enter the SQL shell:

docker run --name vsql --link viyadb:viyadb --rm -ti \
  -v $(pwd):/tmp/viyadb viyadb/viyadb:latest \
  /opt/viyadb/bin/vsql viyadb

and run some SQL queries:

ViyaDB> SHOW TABLES
activityViyaDB> SELECT * FROM activity LIMIT 3
app_id                  user_id     event_time  country  city            device_type  device_vendor  ad_network  campaign  site_id  event_type        event_name  organic  revenue  count
com.ego360.flatstomach  1184765537  1420156800  HK       Tsuen Wan       iOS          Apple          session     True      0        1
com.ego360.flatstomach  485969059   1420156800  IL       Ramat HaSharon  Android      Samsung        session     True      0        1
com.outfit7.talkingtom  2971215966  1420156800  IN       Ghiror          Android      Samsung        inappevent  Outline   True     1.16875835197557  2

ViyaDB> SELECT device_type, count FROM activity ORDER BY count DESC
device_type    count
Android        8305
iOS            1691
Windows Phone  4

ViyaDB> SELECT SEARCH(app_id, 'com.v') FROM activity LIMIT 10
com.vivitylabs.android.braintrainer
com.vinylab.radiohammer1
br.com.verde.alarme
com.vertumus.rewun

Future Plans

This is just the beginning, and there’s more functionality to come:

Now, that we have an infrastructure, it can be used for extending current SQL dialect. First candidates are: LOAD DATA — for direct TSV files loading and DESC — for displaying table schema. DDL is, again, not a priority (if you think otherwise, please drop me a line).
The other important task is, creating a JDBC driver for ViyaDB, which will facilitate integration with different BI tools.
The work on clustered solution is in progress as well, so the module must be integrated to provide unified SQL interface to the whole cluster.

As always, your help is wanted here, even if it’s an issue report or simple advice. If you’re located in TLV area, and you have some thoughts about the project — let’s grab a coffee :)

Originally published at https://www.linkedin.com on November 6, 2017.

Adding SQL Interface to a non-relational database

Goals (and Non-Goals)

Implementation

Example

Future Plans

Written by Michael Spector