Running SQL on No-SQL databases

Kung Wang
Guru @TOTVS Labs
Published in
5 min readJun 30, 2017
Photo: Carl Heyerdahl

How can one apply SQL skills in a NoSQL world? For the past 20+ years, SQL database systems like Oracle, Sybase, MySQL, PostgresSQL have been successfully dominating the database backend. Many engineers have been trained to develop their applications on top of these databases. There are also many software frameworks developed on top of SQL databases in order to speed up the development cycle or make it a fun experience to use. These database systems also ship with connectivity protocols like ODBC or JDBC to help building bridges between a database and many applications.

In the post-internet era, the amount of data has increased at an exponential speed, and demands on data BI and AI have led businesses to operate on bigger and bigger data sets, and SQL servers are no longer suitable because of cost and speed. The demand has led to many technological innovations, and one of them is the creation of NoSQL databases.

NoSQL databases bring data mining solutions for tremendous data, but also a steep learning curve for many traditional SQL developers. Of course, the demand has also been filled quickly by introducing SQL-like or even SQL-92 compatible tools. Nonetheless, these solutions require you to learn and install another set of commands that is not SQL. The situation can be even worse, if you also need to run another server or make a duplication of your NoSQL database just to use these tools. Can we have the beauty of both SQL and NoSQL worlds, with a low cost of ownership and that is easy to maintain? Yes, we can!

There are many production-ready NoSQL databases available in the field and many of them are either license-friendly or free to use. It is also possible to reduce DBA staff or totally eliminate this job class by using these databases. It’s a win-win situation for small to large companies. Developers can also access these databases easily without any license worries. When we talk about databases, we are implicitly talking about real-time or near-real-time responses. For eventual-response databases, their usage is more for analytics or data mining. Since SQL databases are heavily used in OLTP operations, we will need to pick a real-time or near-real-time NoSQL database as our foundation to support SQL operations.

Since the major database functions are search and store, the obvious candidate in the NoSQL world to match both criteria is a search engine backed database like elasticsearch or solar. These search engines can easily support any simple one table Create-Retrieve-Update-Delete operations. Here are a couple of head-to-head matches using elasticsearch syntax and SQL:

Create operation:SQL:
INSERT INTO MyTable (field1, field2, ...) VALUES (value1, value2, ...);
Elasticsearch:
PUT {your_index_name}/MyTable/1
{
"field1" : value1,
"field2" : value2,
...
}
Retrieve operation:SQL:
SELECT * FROM MyTable;
Elasticsearch:
POST _all/MyTable/_search
{
"query" : {
"match_all" : {}
}
}
Update operation:SQL:
UPDATE MyTable SET field1=value1, field2=value2, ... WHERE primary_key_field = primary_key_value;
Elasticsearch:
PUT {your_index_name}/MyTable/primary_key_value
{
"field1" : value1,
"field2" : value2,
...
}
Delete operation:SQL:
DELETE FROM MyTable WHERE primary_key_field = primary_key_value;
Elasticsearch:
DELETE {your_index_name}/MyTable/primary_key_value

As you can see in the above examples, these statements between SQL and NoSQL databases are very different, but if you look closely, you will soon find out all required information for doing each of these operations are the same between both databases.

It seems we have an answer for our SQL developers! As long as we can somehow compile SQL statements into corresponding NoSQL statements, then we are good to go!. Unfortunately, SQL is much more powerful than the above simple operations, and developers are often required to write complex business logic in SQL. In those queries, you usually see that tables are related to each other in many different ways. There is no way to solve these complex queries in any search engine as it was not designed or not enabled to do so. Here, before we hit a dead end, let’s add a layer to the search engine and see if we can enable it to do these complex queries.

Since SQL databases are indeed all relational databases, if we can choose a relationship supported NoSQL database, then we should be able to make SQL-like complex queries into the NoSQL database. Databases that support relationship operations will be graph databases, as graphs have edges and vertices. Traversing vertices via corresponding edges is very similar to join in relational databases. There are some production-ready graph databases on the market. For example, Titan is a license-friendly graph database, which runs on top of Cassandra NoSQL database.

Let’s compare join queries in SQL vs Gremlin, which runs on Titan:

Inner-Join between tablesSQL:
select c.*, p.*
from mdmcustomer as c inner join mdmcompany as p on c.mdmtaxid = p.mdmtaxid
Gremlin:
g.V().hasLabel("mdmcustomer").as("customers")
.out("is Company").as("company")
.select("customers", "company")
Multi-table joins with filters
SQL:
SELECT customer.*
FROM mdmcustomer as customer, mdmcompany as company, mdmopportunity as opportunities, mdmticket as tickets, (SELECT tax_id, COUNT(*) as cnt num_of_invoices FROM mdminvoice GROUP BY tax_id) invoices
WHERE customer.tax_id = company.tax_id (+)
AND customer.tax_id = opportunities.tax_id (+)
AND customer.tax_id = invoices.tax_id (+)
AND customer.tax_id = tickets.tax_id (+)
AND invoices.num_of_invoices > 2
AND customer.mdmcompanycode = 'T09699'
AND rownum < 3;
Gremlin:
g.V().hasLabel("mdmcustomer").has('mdmcompanycode', 'T09699').as("customers")
.match(__.as("c").both("is Company").as("company"),
__.as("c").both("has Opportunity").count().as("opportunities"),
__.as("c").both("has Invoice").count().is(gt(2)).as("invoices"),
__.as("c").both("has_ticket").count().as("tickets"))
.select("customers", "company", "opportunities", "invoices", "tickets").limit(3)

At first you may see they are formatted very differently, but after looking closely, you will see again all required information is presented in different way, but all essential parts are the same. Now our engine looks more compatible with SQL, let’s find a way to convert Gremlin to SQL, then we can use SQL2Gremlin to convert SQL into Gremlin, so developers can use SQL on our engine and query it just like a SQL database.

Now that we know we can run SQL on a NoSQL database, this opens a lot of doors for development or migration of complex applications from SQL to NoSQL databases. This strategy reduces costs and maintenance of the database for customers. Not to mention that NoSQL databases are enabled for cloud operations and are easier to scale out using any cloud provider. There is no need to feel that your SQL skills are obsolete when people talk about NoSQL databases!

Photo: Jon Tyson

--

--