The Lockdown Wheelie Project

David Colls
The Sports Scientist
7 min readAug 13, 2020

--

“It’s Strava for wheelies,” my lockdown project, combining hyper-local exercise with data analytics to track and guide improvement. Practising wheelies is a great way to stay positive; after all, it’s looking up, moving forward.

I’ve been at it for a month now, logged 1,331 attempts, and seen my maximum time improve from two to nearly six seconds. In the process, I’ve generated millions of data points to guide further improvement through machine learning.

Chart of duration of all wheelies

The Data Collection Rig

It doesn’t get much slicker than this. The hardware comprises:

  1. (A) 1 bike
  2. (B) 1 smartphone
  3. (C) 1 roll duct tape

Using a generous amount of duct tape (C), affix the phone (B) to the top tube of the bike (A) near the head.

The bike in this instance is a Specialized Stumpjumper 2017 XL 29” FSR Comp, and the phone is an iPhone X, running the SensorLog app to collect data (see below for more setup details).

Building a Wheelie Training Diary

Goals are wheelie helpful for directing effort. My wheelie goals are directing what I build in the Training Diary, but I’m also discovering interesting things along the way, when I visualise the data and when I share it with others.

What do I want to achieve in lockdown? Two things related to wheelies:

  1. My longest wheelie exceeds 8 seconds — this benchmark was set based on a friend whose mountain biking in general sets the standard for me, although they have now recorded +11s!
  2. I can reliably wheelie for 4 seconds — this seems like a reasonable measure of consistency given 8 second target for longest wheelie

After processing session data from the phone, I have a list of every wheelie, with its duration, and a link back to the source for extracting further information, like location, speed, etc.

With these goals in mind and the available data, I’ve built a diary comprising:

  • Summaries of results and effort
  • Details of improvement over time
  • Play-by-play replays

Summaries

The measures for tracking my key goals are:

  1. Maximum wheelie duration, for the longest wheelie target
  2. Median wheelie duration, for the consistency target

I plot these by “Session”, which is any day with >0 wheelies.

Chart of wheelie results per session

I’m also curious about distance, and I use it in the play-by-plays, so I include that too. By these measures, I feel like I’m on track to achieve my goal in a few weeks to months — great! I don’t know if I’m a fast or slow learner though; 1,331 wheelies seems like a lot…

Sharing this project with colleagues, a self-professed sports data nerd asked to see the cumulative effort over time, as they understood skills development was often a matter of time, as well as effort. So I added cumulative effort over time…

Charts of effort analysis

… and a breakdown of effort per session. This made me more comfortable reducing the intensity of sessions, as a result, and I think, improved the quality of wheelies.

Improvement Over Time

The summaries in their current form show improvement over time, but I also wanted to get different and finer-grained views of improvement over time. For this, I’ve found plotting and comparing the distributions of wheelie durations in each session to be really helpful. I can gradually see the distribution moving right, even when I don’t set a new longest duration.

Charts of wheelie distributions

Visualising the change is great, and it gives insight into how to quantify change. When another colleague asked for video evidence of the wheelies (a perfectly reasonable request, now supplied above), I wondered to myself how I’d go about synthesising this data. Given the “data doping” issues on Zwift, I also thought about how I’d prove the synthesised data was me. Difficult, given that I want to be unrecognisable from the rider I was.

Any session recognisable as “me” should statistically come from “my” underlying duration distribution. But I don’t know what this is, and it’s changing — by design — as I improve. So now I’m quantifying change with a statistical technique called a two-sample Kolmogorov-Smirnov test, which tells us whether two samples were likely drawn from the same underlying distribution, or not. I want to know when I’ve improved enough from one session to the next that I look like a different rider to the KS test! So here I compare every pair of sessions in a matrix.

Comparisons of similarity of different sessions

If I change sufficiently every session, this matrix should be a diagonal line. If I don’t change at all, it will be a full square. It’s nice to see that, in my latest session, I am a different rider to every other session except my second-latest session. This contrasts with the extended period in the middle where, while the key metrics were going up, I wasn’t looking like a different rider. I’m still trying to find the best way to visualise this to tell the story…

Multiple views of comparing sessions

Play-by-Play

We all love reliving the highlights, and highlights don’t come better than wheelie highlights. For play-by-play review, it’s back to the time series data linked to each wheelie event.

Right now, I trace the pitch (rotation upwards of the front of the bike) against distance covered, with some visualisation of how steady I am using variation in the roll of the device. But in future, I’d also like to add more data visuals to these, like video, or location, speed, etc, to share as little wheelie postcards.

Here’s my longest wheelie so far, at 5.8 s and 22.4 m.

Trace of wheelie motion

And here’s a gallery of top wheelies by duration — look at all those different shapes! — prompting the next round of quantifying improvement…

Many wheelie motion traces

Data Collection and Extracting Events

The Training Diary is built on top of a data processing pipeline. Here are some details of how that’s set up.

In addition to the hardware above, the software comprises:

  • SensorLog app for iOS (or consider AndroSensor for Android)
  • Google Drive account and app installed on the phone, for transferring data
  • Colaboratory (cloud-hosted Python notebook environment based on Jupyter), for processing sensor data and extracting events

More configuration details are provided in the wheelies resources README.

The data coming out of the sensor rig is in time series format. For each time that the phone’s sensors were sampled, we have a reading for each of the enabled sensors, as below:

Table of time series sensor data

As the bike rotates upwards into a wheelie, the motionPitch value increases. When it exceeds 0.88 radians, we determine that a wheelie is occurring.

def wheel_up(pitch):  return pitch > 0.88

So in this way we convert a continuous sensor value to a binary true/false label that conveys meaning (and add column wheel_up to the dataframe). Note from the plot below, there is occasional drift in the pitch value unrelated to wheelies (I think when I sharply changed direction 180 degrees), so the labelling is not perfect — it may pick up the occasional false positive — but it’s good enough for now, and I’ll discuss in future how it might be improved.

With run-length encoding, using Python library python-rle, we can now transform the time series data into events, which tell us something meaningful, with some parameters, happened at some time. In this case, a wheelie of a certain duration and distance. The event links back to the time series source to support further analysis.

Table of wheelie events

In preparing the event data, we do some further transformations on the source sensor data to:

  • Fill missing source data related to location, using Pandas interpolate method
  • Converting GPS coordinates to distance with simple approximate formula

I have shared a notebook for processing the uploaded time series sensor data and extracting events. This processed data feeds the training diary and can feed into further downstream analysis.

Future Plans

Of course I plan to use a proper phone mount, but there are many other directions to take this lockdown project.

I’ll be focussing on adding machine learning in the next stage, to better detect wheelies with noisier data, different bikes, different sensor set-ups, etc, to identify success factors, and maybe to coach the rider. I anticipate trialling custom solutions and MLaaS. I feel that I have the foundations for a virtuous data generation and labelling system through this initial work.

Other directions this could go:

  • Automating more of the data processing pipeline, standardise the tooling
  • Extending and polishing dashboard functionality
  • Real time streaming for collection and sharing for distanced but social training
  • Client app development or custom hardware prototyping
  • Video capture and rider pose analysis as another data stream
  • Find another skill to improve with sensor data

Here’s hoping to be back soon with an ML-powered 8 second wheelie!

--

--