Adding SQL Interface to a non-relational database

Michael Spector
ViyaDB
Published in
4 min readNov 6, 2017

When it’s the right time for adding SQL interface to a database, which wasn’t designed as a relational database? Let’s take a step back, and ask whether SQL interface is needed at all. Well, even though the original JSON-based query language looks pretty intuitive, the following reasons for adding basic SQL support to ViyaDB are obvious:

  • It’s crucial for on-boarding since SQL is like English in databases world — everybody speaks it.
  • Integration with data exploration tools that support SQL out of the box becomes simpler.

BTW, if it’s the first time that you hear about this database, please refer to my previous article to learn more about its purposes.

Goals (and Non-Goals)

Implementing a subset of SQL with only DML support (no DDL at first) is a good start. There’s no goal of adhering to some SQL standard, especially because the database is not a regular relational database. For example, since all the dimensions and metrics are part of the schema, and since ViyaDB pre-aggregates metrics it’s worthless writing GROUP BY in a SELECT query, so we can just write:

SELECT app_id, count
FROM activity
WHERE event_time BETWEEN '2015-01-01' AND '2015-01-14'

JOINS are not supported by the database, therefore there’s no need for implementing this syntax as well.

Implementation

Since we have JSON-based API already, the idea was to create a standalone module that translates SQL into JSON representation, like it’s shown in the following diagram:

The advantage of having a standalone SQL parser module is that it can be reused in both single node and clustered environments. In a clustered environment, the standalone SQL module can be part of a query facade, which will use the internal JSON query representation for building execution plan on querying remote workers and pushing down predicates.

There are not many C++ parser generators to choose from. Flex/Bison pack is one of the most popular ones, so the choice was made in favor of these frameworks. The learning curve for writing parsers is quite long, fortunately, there are projects like this, which help bootstrap a parser project quite instantly. Basically, what you need is writing two files: a scanner file and a grammar file. The first one defines how SQL text breaks down into tokens, while the second one uses these tokens to define supported SQL grammar. The final step is running Flex and Bison on these files respectively, what generates the parser source code.

Example

Let’s run a ViyaDB Docker instance, and load some data into it ( previous article contains all the details on how to do that). The only distinction, is that we are going to name the database container in order to be able to link with it:

docker run -p 5000 : 5000 --name viyadb --rm -ti \
-v $(pwd):/tmp/viyadb viyadb/viyadb:latest

Now, let’s enter the SQL shell:

docker run --name vsql --link viyadb:viyadb --rm -ti \
-v $(pwd):/tmp/viyadb viyadb/viyadb:latest \
/opt/viyadb/bin/vsql viyadb

and run some SQL queries:

ViyaDB> SHOW TABLES
activity
ViyaDB> SELECT * FROM activity LIMIT 3
app_id user_id event_time country city device_type device_vendor ad_network campaign site_id event_type event_name organic revenue count
com.ego360.flatstomach 1184765537 1420156800 HK Tsuen Wan iOS Apple session True 0 1
com.ego360.flatstomach 485969059 1420156800 IL Ramat HaSharon Android Samsung session True 0 1
com.outfit7.talkingtom 2971215966 1420156800 IN Ghiror Android Samsung inappevent Outline True 1.16875835197557 2

ViyaDB> SELECT device_type, count FROM activity ORDER BY count DESC
device_type count
Android 8305
iOS 1691
Windows Phone 4

ViyaDB> SELECT SEARCH(app_id, 'com.v') FROM activity LIMIT 10
com.vivitylabs.android.braintrainer
com.vinylab.radiohammer1
br.com.verde.alarme
com.vertumus.rewun

Future Plans

This is just the beginning, and there’s more functionality to come:

  • Now, that we have an infrastructure, it can be used for extending current SQL dialect. First candidates are: LOAD DATA — for direct TSV files loading and DESC — for displaying table schema. DDL is, again, not a priority (if you think otherwise, please drop me a line).
  • The other important task is, creating a JDBC driver for ViyaDB, which will facilitate integration with different BI tools.
  • The work on clustered solution is in progress as well, so the module must be integrated to provide unified SQL interface to the whole cluster.

As always, your help is wanted here, even if it’s an issue report or simple advice. If you’re located in TLV area, and you have some thoughts about the project — let’s grab a coffee :)

Originally published at https://www.linkedin.com on November 6, 2017.

--

--