Past projects: A Chartbeat bot

I’m catching up on some recent projects before moving on to the next wave of stuff. Today, a Chartbeat Slack bot, built for Pacific Standard.

Chartbeat is a go-to tool for realtime analytics. At any given moment, it lets you see how many people are on your site and what they’re reading. However, drilling down into the data takes time, realtime numbers disappear as the day goes on and it’s hard to find any insights beyond the top-performing stories across the site. I wanted to build something to track performance as we published and convey that information to the whole newsroom.

The problem: It’s hard to answer one important question with Chartbeat’s default tools: How are the stories we published today doing?

The solution: Pull raw data from Chartbeat, calculate story performance and represent it in a way that makes sense at a glance.

The implementation: With this bot, I wanted to compare the day’s stories to a three-week moving average of visitors, see what percentage of traffic overall was coming from stories published that day and show how story performance changed throughout the day. Because this was meant as a quick overview for everyone in the newsroom, I wanted to make everything visual and remove raw data entirely.

Getting the data starts with Pacific Standard’s RSS feed, where a backend script pulls recently published stories. It then gets traffic data for each story from the Chartbeat API and assigns them a rating based on how they performed against an average. Since every story gets checked multiple times throughout the day, each is stored with a sample number, so that its total traffic number is only compared against an average of every story in its sample. Those samples, all stored in a PostgreSQL database, also get used to build historical data for each story.

A report of this data gets sent to the newsroom’s main editorial Slack channel twice a day. Stories are ranked from best- to worst-performing, with a line drawn at the average. In-line sparklines show historical performance for each piece.

Challenges: Finding a balance between legibility and conveying enough data took some fine tuning, as did making sure the symbolic representations of traffic made sense to a wide audience. Setting up the database in such a way that historical data and the most recent data points are both available in queries also took took time, mostly to research how to get a max value column with SQL.

Further work: Taking more samples per day would build out a much larger database, and limiting averages to previous stories from the same day of the week might give a more accurate picture of relative performance.

If you’d like to know more, email me at nicholasrhagar@gmail.com or find me on Twitter.