Here’s What “24 hours” of Running in Boston Looks Like

Published in

ASICS Digital

4 min readOct 26, 2016

My job as Runkeeper’s data analyst has me deep in our numbers all day long, but lately I’ve been experimenting with geovisualization of Runkeeper runs for fun. I chose to start with runs from Runkeeper employees in and around the Boston area throughout our company history, playing back a few thousand runs as if they occurred in just one day.

Here it is. (If you want to skip to the action, jump to about the 25-second mark; we like running at Runkeeper but not quite to the point that we’d go running before 5am.)

Runkeeper Employee Runs — Boston, MA

Each run leaves a faint trail behind it, and as hundreds of trails layer over each other, a map of Boston emerges. This isn’t a perfectly representative map, but it captures a number of the most popular running destinations around Boston, including the Charles River loop, the Chestnut Hill Reservoir, Castle Island, and more. Look closely and you’ll see the Boston Marathon route, with its famed “Right on Hereford, Left on Boylston” turns. All those years of running ultimately wind up looking like this:

The final map of Runkeeper employee runs in and around Boston

For the curious, here’s the process that went into creating this:

Identifying the Activities

I gathered metadata on the runs (“trips,” in internal Runkeeper parlance) that I was interested in looking at with a query to our Redshift data warehouse. At this stage I did some light filtering to exclude activities that were obviously erroneous (traveling significantly faster or slower than normal human running speeds, traveling very short distances, etc.) and bounded my search to trips in the Boston area based on latitude and longitude.

Pre-processing Path Data

I then collected and processed the trip path data into a more usable format. Paths get stored as sets of latitude/longitude points by default, which is usually fine — but we need to project those coordinates or the results wind up vertically compressed, as they appear in this example:

Downtown Boston rendered with raw lat/long coordinates

In this case, I used the Python UTM package to project our lat/long coordinates into the Universal Transverse Mercator system. This is a good quick hack, but doesn’t work everywhere — cities like London that lie on the boundary of two UTM zones are hard to represent with this method.

Alongside this conversion, I also calculated time offsets — seconds since midnight — for each point in each path. This data goes into creating the animation…

Rendering the Results

Having processed the points data, I then needed to render everything. This involves stepping through the seconds in the day — each 30-second increment is one frame in the final animation — and drawing out all of the paths that have been traveled up to this point. The blue blobs that you see represent specifically the last 30 seconds worth of running — the longer the blob, the faster a given runner was traveling. (In a few cases, you might see really long, fast-moving blue blobs; those are typically moments where somebody forgot to turn off Runkeeper as they got on a car or train).

As time marches forward in the animation, the resulting image gets more and more complex and takes longer for our server to draw. To speed up the process, I’m using the Python multiprocessing package to draw several frames of animation at once, netting a roughly 10x speedup.

The Tools I Used

My final data set was runs from Runkeeper’s internal employees throughout our company history (up to mid-October 2016).
My visualization tools of choice was matplotlib in a Jupyter notebook.
I used Seaborn to tweak some of the visual stylings.
I did the map projections with the Python UTM package.
I used ffmpeg to encode the resulting frames into a video.

What’s Next?

I plan to set these animation tools loose on other cities and countries around the world, but there’s an obvious problem: Runkeeper is based in Boston. Barring the occasional business trip or vacation, most of our internal employee runs stick to the Boston area, too. The obvious solution is to look at runs from Runkeeper users, but we want to do so in a way that respects user privacy.

My tentative solution to this is to follow a few key rules:

Use only runs from users that have marked their activities with publicly-shared maps. By default, activity maps are only set to be shared among a user’s friends. If you’re a Runkeeper user and you want to learn more about managing your sharing settings, check out this article from our Support team.
Trim off a few hundred meters from the beginning and end of each run. This will help obfuscate where users are coming from in the case that they begin or end their runs at a sensitive location (for instance, their home or workplace).
Select a random sample of activities over a long time period, so that we don’t pull in too many runs from one user or one period of time.

What cities are you hoping to see mapped out with runs? Let me know in the comments here!

Here’s What “24 hours” of Running in Boston Looks Like

Written by Chris Drouin