946 Followers
·
Follow

Bump Charts in Pure PostgreSQL

In a previous post I wrote about creating charts with Postgres, but let’s go beyond simple time series charts. Bump charts are 🔥 🔥 🔥 and here’s how to create them in pure PostgreSQL without any post-query transformations!

Image for post
Image for post
Full Disclosure: These charts were made using awesome software built by my team at Magento Business Intelligence

What is a bump chart, exactly?

A bump chart plots a datapoint’s rank among values instead of the value itself. For example, if you have three sets of values:

A: [123, 200, 90]
B: [100, 221, 30]
C: [12, 279, 1560]

A bump chart would plot them with these values instead.

A: [1, 3, 2]
B: [2, 2, 3]
C: [3, 1, 1]

Not every chart tool can plot in reverse order (ie. 1 at the top), and it’s easier to make 3 the highest value instead of 1 so we’ll do that here.

A: [3, 1, 2]
B: [2, 2, 1]
C: [1, 3, 3]

Image for post
Image for post
The green series dwarfs the others in the 3rd data point. The bump chart shows the rank of each data point.

Bump charts are useful when you have a wide range of values, especially if one or two values dominate the entire chart.

To make them we need to define three temporary tables and use a few sub queries. In an earlier draft of this post I described each one, but it was impossible to follow. Instead, I’ll point out the information you need and then build it for you!

I assume you’re tracking an event and want to compare it across different groups.

EVENT_TABLE = The name of your event’s table

EVENT_DATE = The date column for when the event occurred

TIME_UNIT = The postgres date format for the time interval you’re using. For example, to use a daily time interval, use ‘YYYY-MM-DD’.

EVENT_GROUP = The column we’ll group by. This defines who or what we’re comparing in the bump chart (eg, user id, event type, etc,)

INNER_FILTERS = Any additional filters you want to tack on to the JOIN between the temporary date_cat table and the event table. This can help with performance by cutting down on the number of comparisons the DB needs to make.

OUTER_FILTERS = A set of WHERE clauses if you need to filter out other rows from this table. An example is j.type = ‘SQL Report Builder Usage’

SERIES_LIMIT = The limit on the number of series you want in your chart. If you want the top ten values, use 10.

ROW_LIMIT = The maximum rows you want this query to return.

Replace the bold text below with the names of tables, columns, etc. from your database.

with all_dates as 
(select distinct to_char(EVENT_DATE, TIME_UNIT) as time_unit from EVENT_TABLE),
categories as
(select distinct EVENT_GROUP from EVENT_TABLE), date_cat as (select time_unit, EVENT_GROUP from all_dates, categories)

And if you don’t want to do that yourself, I created a handy observablehq notebook to do this for you. Now get bumping!

Acknowledgements:
Huge thanks to my friend and colleague, Akash Agrawal of Nupanch for co-writing that SQL with me!

Written by

chart junkie @ Betterment http://bengarvey.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store