18 Million Steps: A Visual Journey
Exploring 5+ years of Fitbit, Part 1
“Most users only use their devices for about six months, then they end up in a drawer never to be seen again.”
I’m not a typical user.
I still remember where I was when I first read about Fitbit. I was in my first year of my PhD program and probably the only student in our class spending part of their day reading Techcrunch and Gizmodo. As a researcher in training with a focus on physical activity, the Fitbit immediately caught my eye. You see, researchers have long been stuck with clunky tools to track and measure physical activity behavior. The Fitbit looked to usher in a new era, based on the same simple measurement system (an accelerometer), it promised to track steps and activity and wirelessly update data. I know it seems quaint now, but you have to remember that this was late 2008. The iPhone was only a year and half old when Fitbit posted their first blog post announcing their participation at Techcrunch 50. They shipped their first devices a year later in late 2009, and if you’ve been following their story, and the subsequent birth and explosion of the wearables industry, you know what happened next.
I admittedly was a bit late to the party. I ended up buying my first Fitbit in February of 2011 — blame it on graduate student stipends or a researcher not quite wanting to buy in the hype. Anyway, there I was walking out of Best Buy with a clear plastic box and a device that would guide much of my life for the following five years.
But this post isn’t really about that. It’s about the data, so let’s get to it.
From February 27, 2011 to June 21, 2016 I’ve tracked a total of 18,293,229 steps with Fitbit. I wanted to work on my data analysis and visualization skills so I started to explore this trove of data. What follows is a series of visualizations that start to give some context and insight into my physical activity and overall behavior.
All of the following visualizations were done in R using the popular ggplot2 package. If you’re interested in exploring the code and data I’ve posted the scripts and data on Github.
I’ve split this post into two sections: Daily Data and Minute Data. The first section only deals with the aggregate daily data set, that is, the total amount of steps taken per day. The second section makes use of the intraday (minute-level) data set. This is the most “raw” data Fitbit provides through their API.
Digging into the Days
The first thing I wanted to do was break down the data by year. Pretty easy to do with some simple date extraction.
Looks like 2013 wasn’t a very good year. I remember losing my Fitbit in 2013 while I was shopping for clothes, but how many days did I actually lose? Let’s find out how good I am at actually wearing these things.
First, we can plot the number of days per year that I’ve worn the Fitbit. Any day with more than zero steps is considered “wearing” for this plot.
Turns out I only wore a Fitbit for 145 days in 2013. Every other year was pretty good though. I actually didn’t miss a single day in 2015!
I also wanted to explore how consistent I was at wearing the Fitbit so I created a simple plot that illustrates when I was and was not collecting any data. Again, you can see that big gap in 2013 into early 2014 pretty clearly.
Plots are great, but what about the numbers? If you add up the number of days I was wearing the Fitbit and compare it with the number of available days in this data set you’ll find that I wore the Fitbit 86% of the time. That is, I did not collect any step data on 270 of the possible 1942 days in the data set. Considering that you have to remember to charge the device, not lose it, or not damage it, I feel like that’s a pretty good track record.
Let’s keep going.
Time series visualizations are always fun, and since this data is inherently a time series of my activity I wanted to explore a few different ways to represent it. The first, and most simple way to that is to plot every day and see what patterns or insights jump out.
It’s a bit hard to see here, but there is a real reason why the y-axis range is so large. There are actually two days in this data set with over 40,000 steps. The first being when completed the Marine Corp marathon (53,904 steps) on October 30, 2011. The second being the most recent Los Angeles marathon (42,231 steps) on February 14, 2016.
With so much information on one plot, it’s hard to really see anything significant so let’s break it up by year.
How about plotting each day for every year?
The next thing I wanted to look at was if there was a weekly pattern in my activity over time frame. I aggregated the data across days of the week and plotted the sum totals per year.
Interesting patterns here. Looks like in 2011 and 2012 I was more active during the week, and since 2012 I’ve shifted to being more active on the weekends.
What about goals? Fitbit automatically sets your daily step goal to 10,000 steps. I’ve never changed it from that. It’s a nice round number, and provided me with ample motivation to be active. How good was I at meeting that goal? Let’s look at the data.
So, it turns out I’m not the best at hitting 10,000 steps. When you account for the days I didn’t wear a Fitbit at all, I still only hit 10,000 on 53.4% of days. This year looks to have been especially bad (look at all those black bars!), and I’m going to blame that on focusing on completing my dissertation — ironic as my research was on how people use Fitbits.
Overall, the mean is 10,947 steps per day when you exclude the days I didn’t wear a Fitbit. If you add those back in it falls to 9,420 steps per day. Still pretty good I say.
The last plot I made to explore my daily data set was to create a visualization of the running cumulative total steps. Why? Why not?!
If you use this simple cumulative data and just model the relationship between date and cumulative steps per day we get a predicted increase of 8,412 steps per day. Most likely using a linear regression doesn’t capture the true variability and intricacies of daily behavior over nearly 2,000 days of data, but it’s a good start. With that model I should hit 20,000,000 steps on January 10, 2017.
Munching the Minutes
Minute-level data is much more fun. It gives you a better look into when activity was happening and how much of it. Let’s get cracking!
The first thing I did was plot every single minute of every single day. The plot is much too large to fit on this page, but if you click the image below it will take to you the full size image I’ve posted on my personal website.
Putting every minute along the x-axis is fun, but it’s hard to tease out patterns when your plot has to be over 8,000 pixels wide in order to actually see it. How about we just collapse everything to one day, with the x-axis representing the full 24hr clock.
You can definitely see some banding that indicates when I was and was not active. However, even with transparency applied, the sheer volume of lines makes it hard to see patterns across the day for the data.
This scatterplot makes it a lot easier to see some different activity patterns. Mostly, I tend to run (steps per minute > 150) in the evenings and my regular walking speed is right around 110–115 steps per minute. Pretty neat!
We can also run some aggregations on this data and start to break down different within day patterns across days. Let’s start with just the mean steps per minute across all days.
Five years of data and we see that once I get up in the morning I’m likely to be active, dip down in the afternoon, then achieve my highest levels of activity in the early evening, with a rapid decrease thereafter.
Let’s break it up a bit and see if that patterns holds across all days.
So, right off the bat we can see differences in the high level activity (usually running) between weekdays and weekends. I run in the morning on the weekends and I run in the evening on the weekdays (of course there are some exceptions).
What happens if we condense the data and plot the mean day per day of the week?
What I thought I saw in the scatterplot is actually born out in this data — I tend to get much more activity earlier in the day on Saturdays than any other day. You can also see an interesting little set of bumps on Monday after 10PM. That’s most likely due to a fun group I’ve been running with for the past year or so.
That’s probably enough plotting for today. I’ll wrap this up with a plot I’ve been wanting to create ever since I read Stephen Wolfram’s post about his personal data. The plot below maps the presence of data (if a step was recorded) for each minute of the day for each day in the data set (again, click for the full size image).
This has only scratched the surface of the data that is available. I haven’t even touched any of the intensity data, or matched data sets from other services like Moves or Strava. I’m going to continue plugging away and as I generate more visualizations I’ll update this post with links.
All data used in this post was gathered through Fitabase. Fitabase is a data management and analysis platform used by researchers around the world to incorporate wearables into their research projects. If you’re thinking about using wearables or consumer tools for research and don’t want to mess with APIs give them a holler. They’re great. Many thanks to Aaron Coleman and his team.
Questions? Leave a response or get in touch! I’d love to hear from you.
Postscript: I also posted these visualizations to the dataisbeautiful subreddit, a typically nice community of data visualization practitioners and aficionados. Someone ran with the data and made some more visualizations. Very cool indeed.