Embrace SQL Thinking

Dean Wampler
97 Things
Published in
2 min readJun 14, 2019

Look at this query:

SELECT c.id, c.name, c.address, o.items FROM customers c
JOIN orders o
ON o.customer_id = c.id
GROUP BY c.id

We acquire all the customers who have orders, including their names and addresses, along with the details of their orders. Four lines of code. Anyone with a little SQL experience, including non-programmers, can understand this query.

Now think about a Java implementation. We might declare classes for Customer and Order. I remember well-meaning consultants saying we should also create classes to encapsulate collections of them, rather than use “naked” Java collections. We still need to query the database, so we pull in an object-relational mapper (ORM) tool and write code for that. Four lines of code quickly turn into dozens or even hundreds of lines. The few minutes it took to write and refine the SQL query stretch into hours or days of editing, writing unit tests, code reviews, and so on.

Can’t we just implement the whole solution with only the SQL query? Are we sure we can’t? Even if we really can’t, can we eliminate waste and write only what’s essential? Consider the qualities of the SQL query:

  • We don’t need a new table for the join output, so we don’t create one: The biggest failing of applied object-oriented programming has been the belief that you should faithfully reproduce your domain model in code. In reality, a few core type definitions are useful for encapsulation and understanding, but tuples, sets, arrays, and so forth are all we need the rest of the time. Unnecessary classes become a burden as the code evolves.
  • The query is declarative: Nowhere does it tell the database how to do the query, it just states the relational constraints the database must satisfy. Java is an imperative language, so we tend to write code that says what to do. Instead, we should declare constraints and desired outcomes, and then isolate the how implementation in one place or delegate to a library that can implement it for us. Like functional programming, SQL is declarative. In functional programming, equivalent declarative implementations are achieved using composable primitives, such as map, filter, reduce, and so on.
  • The Domain Specific Language is well matched to the problem: DSLs can be somewhat controversial. It’s very hard to design a good one, and the implementations can be messy. SQL is a data DSL. It’s quirky, but its longevity is proof of how well it expresses typical data-processing needs.

All applications are really data applications. At the end of the day, everything we write is a data manipulation program, whether or not we think of it that way. Embrace that fact and the unnecessary boilerplate will reveal itself, allowing you to write only what’s essential.

--

--

Dean Wampler
97 Things

Streaming data and ML expert at Lightbend. Also on twitter, @deanwampler