How Granular Uses SQLAlchemy for Composable, Maintainable Business Intelligence Queries

Granular Engineering

Published in

Granular Engineering

5 min readAug 22, 2018

By Warren Henning

TL;DR summary: SQLAlchemy is extremely good.

Here at Granular, we use SQLAlchemy for all our object-relational mapping needs. All our APIs are powered by it.

A common pattern for new features in our web UI (as opposed to our mobile clients) is to have some sort of main table view that displays customer information in tabular format, plus a summary or aggregate view of that same data, along with some data entry mechanism, which varies depending on the feature. We generally implement these features in our backend using plain old SQLAlchemy queries.

SQLAlchemy lets us implement the aggregate query in a nice way by letting us reuse all the filter and join statements for the main table. That helps ensure correctness by making sure both queries have the same structure, and it lets us avoid duplicating query code. SQLAlchemy queries are composable Python objects, so we can return a query from a function, or swap out the columns we’re selecting for with aggregate statements (such as sums, counts, weighted averages, etc.) without repeating the entire query. The query is lazily executed, so we can just tack on additional query fragments at will, executing only when the data is actually needed.

Let’s consider a highly simplified hypothetical example of this implementation pattern, without getting into all the details and business logic that go into a typical Granular product feature. (The actual features and data model we work with are much richer.) Suppose we want a page in our application that lists a customer’s crop storage locations and all the storage bins at that location, along with the capacity of each bin, like this:

For the main table, we want to list out each storage container and the location it’s at, along with each storage bin’s capacity, which is just a numerical field. As each bin has a crop type associated with it, we can simply group all storage bins by crop type and add up the capacity. The main table, when fully developed as a realistic product feature, could have many filters and other query clauses applied (which we’re ignoring here for simplicity), as well as simply many more columns to display and manipulate, so how could we implement this as two queries without repeating ourselves?

Well, let’s first write the main table query:

Well, let’s first write the main table query:

…

# this is generated from request parameters

request_filters = […]

columns = [

Location.name,

StorageContainer.name,

StorageContainer.capacity

]

def location_bin_query(columns):

return session.query(

*columns

).select_from(

StorageContainer

).join(

Location

).filter(

*request_filters

)

main_table_content = location_bin_query(columns).all()

…

StorageContainer and Location are our SQLAlchemy models for storage bins and storage locations, respectively. We’re selecting only the specific columns we need for simplicity, wrapping the query in a function that lets us pass in whatever columns we want. session is a special SQLAlchemy object that can execute queries against a live database connection; that’s set up elsewhere in the application. The query is not actually executed until we call the all() function, allowing us to add readability to long queries by breaking things up over multiple statements if we wish. Notice how I alluded to dynamically generating a filter list based on request parameters; we can then pass this filter list around, return it from a method, or manipulate it like any other regular Python list, as every part of our query is a concrete, reified Python object, as opposed to a string. This means we can get things like code completion for database columns and ORM relationships even if our editor/IDE is not introspecting the actual database schema. Any Python refactoring tools you have will work on your queries, because SQLAlchemy queries are just regular Python code. Just having all your database interaction be symbolic rather than through strings is a huge win.

Assuming we have that query working correctly, how could we query the data needed to populate the aggregate table that follows? One way would be to just swap our column list for columns that group by the storage container’s crop type, like so:

aggregate_columns = [

StorageContainer.crop_type_id,

func.sum(StorageContainer.capacity)

]

aggregate_query = location_bin_query(aggregate_columns).group_by(

StorageContainer.crop_type_id

)

Our implementation of this query is quite concise, even though it could contain within it considerable complexity, depending on what location_bin_query returns. When executed, the generated SQL should resemble something like:

SELECT

storage_container.crop_type_id,

sum(storage_container.capacity)

JOIN

…

WHERE

…

GROUP BY

storage_container.crop_type_id

And all the joins and filters will be the same as those we specified in our main table query, reusing them without having to recapitulate them in our aggregate query. crop_type_id is a foreign key to our list of crop types we support in our application. Before this query’s results are rendered in HTML, some other code will have to look up crop types by this ID field in order to list the crop type’s name properly, and no doubt there will be other minor details to consider, but that’s all easy enough to accomplish; e.g., we could join to the crop type table and select the crop type name, adding that to the list of grouped expressions, as well.

This compositional querying technique demonstrated above is frequently used in our backend, and it’s an important part of how we stay DRY while moving fast and releasing frequently.

If you use relational databases and Python, you owe it to yourself to give SQLAlchemy a try.

How Granular Uses SQLAlchemy for Composable, Maintainable Business Intelligence Queries

Written by Granular Engineering