Automatically create a pivot table in Snowflake

John Mastro
Aug 7 · 3 min read

In February, we published Automatically create a pivot table in Redshift. This is an updated version of that article, now that we’ve switched to Snowflake, where we use the same technique.

We’ve open sourced the solution described in this article here.

Creating pivot tables is a relatively common need, as seen by its popularity in software like Microsoft Excel and Google Sheets. Some databases (including Snowflake) include a pivot function, but they’re too static — you have to provide an explicit list of values rather than deriving them dynamically from the data. We built a small tool to create pivoted database tables. It can be used as a standalone tool or integrated into ETL pipelines.

Here’s an example of what a simple pivot table transformation might look like:

Background

At Ro, our business and operations are growing quickly in terms of both scale and complexity. We launched at the end of October 2017 with a single brand (Roman) treating a single condition (erectile dysfunction), but since then we’ve added two new brands (Rory, for women’s health, and Zero, for smoking cessation) and several conditions under Roman (hair loss, cold sores, genital herpes, and premature ejaculation), with more on the way.

For the data team, this means our data warehouse needs to support business units and analysts who want to look at data through a variety of lenses. They may want to focus on the overall company, on a particular medical condition, a particular brand, or some cross-cutting category (all prescription medicines vs OTC products). Pivot tables are one important tool for making it but easy and efficient for them to ask and answer the questions they’re interested in.

For context, our data warehouse is in Snowflake, and our solution supports either Snowflake or Redshift (which we used previously). It would be straightforward to adapt it to most other databases.

The usual way

At the database level, the most common way to write pivot queries is probably to hand-write chains of CASE statements. This is unsatisfactory because it’s labor-intensive, error-prone, and needs to be updated by hand every time a new column is added or a new case becomes possible.

Alternatively, pivoting may be done outside the database, using a spreadsheet program or a business intelligence tool like Looker. This doesn’t help if it’s important to have the pivoted data in the database, whether for performance reasons or because it’s used further in your data pipeline.

An alternative

We built a simple way to generate pivot tables (or just pivot queries), which we’ve posted on GitHub here. The README discusses how to use it, including some examples.

The implementation strategy is to take the list of pivot columns, find every existing combination of values in those columns, and then generate appropriate pivoted columns. An aggregation function and default value can be specified individually for each pivoted column, or the script will attempt to guess reasonable defaults based on the column’s name (i.e. SUM will be used for a column whose name starts with `total_`).

The result is that we can declaratively specify pivot tables and have them automatically regenerated. The changes necessary to support new columns are either non-existent or minimal and localized.

Other alternatives

Snowflake has a `pivot` function, but it requires a static list of values, whereas our solution pulls them from the database dynamically as compilation time.

Spreadsheets and business intelligence tools can create pivot tables, but that’s not always a replacement for having an in-database pivot table.

Some databases include related functionality (like PostgreSQL’s crosstab table function or Microsoft SQL Server’s PIVOT feature).

The popular tool dbt also includes a macro for creating pivot tables in the dbt-utils package.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data science

John Mastro

Written by

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data science

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade