Lets go further with SQL

Or have we gone far enough?

Josh Berry
Learning SQL
5 min readAug 16, 2022

--

Photo by Sven Huls: https://www.pexels.com/photo/brown-wooden-bridge-over-green-trees-3801347/

An interesting point

Recently, I read an article on Benn Stancil’s substack titled, “Has SQL Gone Too Far?”. It was a thought-provoking piece that made a number of good points, which I mulled over in my mind as I spent the next 2 hours mowing my grass.

Benn begins his article by acknowledging how institutional good ol’ SQL has become over the last 50 years. He points out that while SQL is the dominant way of manipulating data and has been for some time, it is still being actively advocated and expanded. Benn is right. As recently as last week, I was experimenting with ARIMA forecasting using SQL on Snowpark/Snowflake (Thanks to Felipe’s tutorial, here).

As a hardcore SQL user, I know that SQL isn’t the best at everything, so Benn’s question made me wonder — have we gone too far?

First, some history

In case you haven’t noticed — recently, the term ‘universal semantic model’ has been making its way to the forefront of data conversations. Looker, Kyvos, dbt, AtScale, and others have aligned themselves with the concept. Nothing is actually new, other than the fact that we’re adding the word ‘universal’ or ‘cloud’ to the front of ‘semantic layer,’ which has existed for a long time.

In 2007 wikipedia defined a semantic layer as “a business representation of corporate data that helps end users access data autonomously using common business terms.” However, the term is older than that. In 1991, Business Objects applied for and was subsequently granted patent 5,555,403, which “allows information system end users to access (query) relational databases without knowing the relational structure or the structure query language (SQL)”.

If you can look past the 1990’s GUI, you can see the connection between the user’s semantic understanding of what they want, and the stitching together of SQL that happens behind the scenes:

U.S. Patent 5,555,403

The main point

Benn’s main point is what I spent the majority of my lawn-mowing time thinking about. At the foundational level, SQL is not a true model. We may build a semantic model with it, but at the end of the day, every line has to be defined and written in a way that mimics the semantic definitions of the business. Therefore, no matter how much we obfuscate the SQL code itself with newer technologies such as Looker, Transform Microstrategy, SQL still reigns king. Even if the world’s sexiest job is writing yaml syntax to define “metrics,” ultimately, it all relies on somebody, somehow, telling the layer what SQL-code to run.

Recent developments

Certainly, attempts have been made to re-imagine a new query language. You can find them all over the internet, and they usually end in QL. Personally, I never liked any of them. Possibly, because my brain is already wired to think in terms of SQL, and you can’t teach an old dog new tricks.

The most exciting new development I see is the combination of jinja with SQL, as dbt famously does. While dbt is the Analytics Engineer’s tool of choice, Rasgo focuses more on the business users. Both champion the “universal semantic layer,” and both take advantage of the magical combination of jinja and SQL.

In its simplest use, jinja provides a way to turn a repetitive SQL code into a reusable template. It also provides a way to accept arguments, which can dynamically determine the SQL code that gets rendered. In more advanced uses that I’ve seen, jinja and SQL can accept a few simple inputs from the user and dynamically create entire SQL pipelines without the user having any knowledge of the underlying tables.

I see this as a huge leap forward in the world of SQL. For example, let’s pretend that a user wants to create a heatmap by selecting 2 metrics.

Heatmap by Rasgo

To make a chart like this, SQL must be written. Even if the metrics from both the X and Y axis are from the same table, you can’t just build a heatmap from the raw data. Each axis is a bucket, and the combination of each bucket has a count of records in it.

Therefore, it is quite useful to have a heatmap “template” where the user only has to select the 2 metrics they want. The jinja can figure out the logic as to whether (and how) to join the tables, create the buckets, and return the data in a way that is ready to be graphed.

Conclusion

As I parked my lawn mower, my mind was racing with future possibilities. This templating could be extended into the semantic layer itself. Imagine a situation where instead of a business user asking the engineer to build a Summary Table, they could do it themselves. Perhaps the user provides a list of semantic concepts such as orders, returns, and inventory. Perhaps, based on those inputs, the template can search and find the relevant tables, and make the correct joins, dimensions, aggregations, and time windows.

Edit: the following note no longer applies. Rasgo has transitioned to using AI to write SQL, and has depreciated the SQL Generator.

In my role at Rasgo, we recently introduced a tool that allows you to generate SQL without having to attach your cloud warehouse. This SQL Generator (Broken Link — Rasgo has transitioned to using AI instead of these templates for queries) is a free online tool by Rasgo. It works by using jinja to render SQL statements, driven by the open source code in this Github repo. If you want to dig into the code, I recommend first looking at aggregate. It is a rather simple example of a single SQL operation. One of the most interesting ones was summarize_islands, which wraps much more complicated SQL logic used to solve the classic Islands and Gaps problem.

Though the SQL Generator only demonstrates this modular, templatized concept with 1 operation at a time, you can imagine the possibilities. I don’t even think jinja itself is the SQL templating engine of the future, either. It was mainly built to create HTML, XML, or other markup formats in web applications, and it is already 14 years old. I envision that someday, a specialized derivative will be created in the world of semantic layers and Analytics Engineers. For this reason, my personal belief is SQL has not gone far enough. We will continue to innovate in this arena. The best times are ahead.

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Josh Berry
Learning SQL

Data scientist @ Rasgo, DataRobot, Comcast. Passionate about teaching and helping others.