Bump Charts in Pure PostgreSQL

In a previous post I wrote about creating charts with Postgres, but let’s go beyond simple time series charts. Bump charts are 🔥 🔥 🔥 and here’s how to create them in pure PostgreSQL without any post-query transformations!

Full Disclosure: These charts were made using awesome software built by my team at Magento Business Intelligence

What is a bump chart, exactly?

A bump chart plots a datapoint’s rank among values instead of the value itself. For example, if you have three sets of values:

A: [123, 200, 90]
B: [100, 221, 30]
C: [12, 279, 1560]

A bump chart would plot them with these values instead.

A: [1, 3, 2]
B: [2, 2, 3]
C: [3, 1, 1]

Not every chart tool can plot in reverse order (ie. 1 at the top), and it’s easier to make 3 the highest value instead of 1 so we’ll do that here.

A: [3, 1, 2]
B: [2, 2, 1]
C: [1, 3, 3]

The green series dwarfs the others in the 3rd data point. The bump chart shows the rank of each data point.

Bump charts are useful when you have a wide range of values, especially if one or two values dominate the entire chart.

To make them we need to define three temporary tables and use a few sub queries. In an earlier draft of this post I described each one, but it was impossible to follow. Instead, I’ll point out the information you need and then build it for you!

I assume you’re tracking an event and want to compare it across different groups.

EVENT_TABLE = The name of your event’s table

EVENT_DATE = The date column for when the event occurred

TIME_UNIT = The postgres date format for the time interval you’re using. For example, to use a daily time interval, use ‘YYYY-MM-DD’.

EVENT_GROUP = The column we’ll group by. This defines who or what we’re comparing in the bump chart (eg, user id, event type, etc,)

INNER_FILTERS = Any additional filters you want to tack on to the JOIN between the temporary date_cat table and the event table. This can help with performance by cutting down on the number of comparisons the DB needs to make.

OUTER_FILTERS = A set of WHERE clauses if you need to filter out other rows from this table. An example is j.type = ‘SQL Report Builder Usage’

SERIES_LIMIT = The limit on the number of series you want in your chart. If you want the top ten values, use 10.

ROW_LIMIT = The maximum rows you want this query to return.

Replace the bold text below with the names of tables, columns, etc. from your database.

with all_dates as 
(select distinct to_char(EVENT_DATE, TIME_UNIT) as time_unit from EVENT_TABLE),
categories as
(select distinct EVENT_GROUP from EVENT_TABLE), date_cat as (select time_unit, EVENT_GROUP from all_dates, categories)
select "count", EVENT_GROUP, time_unit, rank() over (partition by time_unit order by "count" asc) as "rank"
from (
select sum("count") over (
partition by EVENT_GROUP
order by time_unit rows unbounded preceding) as "count", EVENT_GROUP, time_unit from (select count(EVENT_TABLE.*) as "count", date_cat.EVENT_GROUP, time_unit
from date_cat
left join EVENT_TABLE on date_cat.time_unit=(to_char(EVENT_DATE, TIME_UNIT)) and date_cat.EVENT_GROUP = EVENT_TABLE.EVENT_GROUP INNER_FILTERS
where date_cat.EVENT_GROUP in
(select EVENT_GROUP from
(select count(*) as "count", j.EVENT_GROUP
from EVENT_TABLE j OUTER_FILTERS
GROUP BY j.EVENT_GROUP
ORDER BY "count" desc SERIES_LIMIT) as "cat")
group by date_cat.EVENT_GROUP, time_unit) as "action"
) as "catcount"
order by time_unit ROW_LIMIT;

And if you don’t want to do that yourself, I created a handy observablehq notebook to do this for you. Now get bumping!

Acknowledgements:
Huge thanks to my friend and colleague, Akash Agrawal of Nupanch for co-writing that SQL with me!

chart junkie @ Betterment http://bengarvey.com

chart junkie @ Betterment http://bengarvey.com