“It’s Strava for wheelies,” my lockdown project, combining hyper-local exercise with data analytics to track and guide improvement. Practising wheelies is a great way to stay positive; after all, it’s looking up, moving forward.
I’ve been at it for a month now, logged 1,331 attempts, and seen my maximum time improve from two to nearly six seconds. In the process, I’ve generated millions of data points to guide further improvement through machine learning.
The Data Collection Rig
It doesn’t get much slicker than this. The hardware comprises:
- (A) 1 bike
- (B) 1 smartphone
- (C) 1 roll duct tape
Using a generous amount of duct tape (C), affix the phone (B) to the top tube of the bike (A) near the head.
Building a Wheelie Training Diary
Goals are wheelie helpful for directing effort. My wheelie goals are directing what I build in the Training Diary, but I’m also discovering interesting things along the way, when I visualise the data and when I share it with others.
What do I want to achieve in lockdown? Two things related to wheelies:
- My longest wheelie exceeds 8 seconds — this benchmark was set based on a friend whose mountain biking in general sets the standard for me, although they have now recorded +11s!
- I can reliably wheelie for 4 seconds — this seems like a reasonable measure of consistency given 8 second target for longest wheelie
After processing session data from the phone, I have a list of every wheelie, with its duration, and a link back to the source for extracting further information, like location, speed, etc.
With these goals in mind and the available data, I’ve built a diary comprising:
- Summaries of results and effort
- Details of improvement over time
- Play-by-play replays
The measures for tracking my key goals are:
- Maximum wheelie duration, for the longest wheelie target
- Median wheelie duration, for the consistency target
I plot these by “Session”, which is any day with >0 wheelies.
I’m also curious about distance, and I use it in the play-by-plays, so I include that too. By these measures, I feel like I’m on track to achieve my goal in a few weeks to months — great! I don’t know if I’m a fast or slow learner though; 1,331 wheelies seems like a lot…
Sharing this project with colleagues, a self-professed sports data nerd asked to see the cumulative effort over time, as they understood skills development was often a matter of time, as well as effort. So I added cumulative effort over time…
… and a breakdown of effort per session. This made me more comfortable reducing the intensity of sessions, as a result, and I think, improved the quality of wheelies.
Improvement Over Time
The summaries in their current form show improvement over time, but I also wanted to get different and finer-grained views of improvement over time. For this, I’ve found plotting and comparing the distributions of wheelie durations in each session to be really helpful. I can gradually see the distribution moving right, even when I don’t set a new longest duration.
Visualising the change is great, and it gives insight into how to quantify change. When another colleague asked for video evidence of the wheelies (a perfectly reasonable request, now supplied above), I wondered to myself how I’d go about synthesising this data. Given the “data doping” issues on Zwift, I also thought about how I’d prove the synthesised data was me. Difficult, given that I want to be unrecognisable from the rider I was.
Any session recognisable as “me” should statistically come from “my” underlying duration distribution. But I don’t know what this is, and it’s changing — by design — as I improve. So now I’m quantifying change with a statistical technique called a two-sample Kolmogorov-Smirnov test, which tells us whether two samples were likely drawn from the same underlying distribution, or not. I want to know when I’ve improved enough from one session to the next that I look like a different rider to the KS test! So here I compare every pair of sessions in a matrix.
If I change sufficiently every session, this matrix should be a diagonal line. If I don’t change at all, it will be a full square. It’s nice to see that, in my latest session, I am a different rider to every other session except my second-latest session. This contrasts with the extended period in the middle where, while the key metrics were going up, I wasn’t looking like a different rider. I’m still trying to find the best way to visualise this to tell the story…
We all love reliving the highlights, and highlights don’t come better than wheelie highlights. For play-by-play review, it’s back to the time series data linked to each wheelie event.
Right now, I trace the pitch (rotation upwards of the front of the bike) against distance covered, with some visualisation of how steady I am using variation in the roll of the device. But in future, I’d also like to add more data visuals to these, like video, or location, speed, etc, to share as little wheelie postcards.
Here’s my longest wheelie so far, at 5.8 s and 22.4 m.
And here’s a gallery of top wheelies by duration — look at all those different shapes! — prompting the next round of quantifying improvement…
Data Collection and Extracting Events
The Training Diary is built on top of a data processing pipeline. Here are some details of how that’s set up.
In addition to the hardware above, the software comprises:
- SensorLog app for iOS (or consider AndroSensor for Android)
- Google Drive account and app installed on the phone, for transferring data
- Colaboratory (cloud-hosted Python notebook environment based on Jupyter), for processing sensor data and extracting events
More configuration details are provided in the wheelies resources README.
The data coming out of the sensor rig is in time series format. For each time that the phone’s sensors were sampled, we have a reading for each of the enabled sensors, as below:
As the bike rotates upwards into a wheelie, the motionPitch value increases. When it exceeds 0.88 radians, we determine that a wheelie is occurring.
def wheel_up(pitch): return pitch > 0.88
So in this way we convert a continuous sensor value to a binary true/false label that conveys meaning (and add column wheel_up to the dataframe). Note from the plot below, there is occasional drift in the pitch value unrelated to wheelies (I think when I sharply changed direction 180 degrees), so the labelling is not perfect — it may pick up the occasional false positive — but it’s good enough for now, and I’ll discuss in future how it might be improved.
With run-length encoding, using Python library python-rle, we can now transform the time series data into events, which tell us something meaningful, with some parameters, happened at some time. In this case, a wheelie of a certain duration and distance. The event links back to the time series source to support further analysis.
In preparing the event data, we do some further transformations on the source sensor data to:
- Fill missing source data related to location, using Pandas interpolate method
- Converting GPS coordinates to distance with simple approximate formula
I have shared a notebook for processing the uploaded time series sensor data and extracting events. This processed data feeds the training diary and can feed into further downstream analysis.
Of course I plan to use a proper phone mount, but there are many other directions to take this lockdown project.
I’ll be focussing on adding machine learning in the next stage, to better detect wheelies with noisier data, different bikes, different sensor set-ups, etc, to identify success factors, and maybe to coach the rider. I anticipate trialling custom solutions and MLaaS. I feel that I have the foundations for a virtuous data generation and labelling system through this initial work.
Other directions this could go:
- Automating more of the data processing pipeline, standardise the tooling
- Extending and polishing dashboard functionality
- Real time streaming for collection and sharing for distanced but social training
- Client app development or custom hardware prototyping
- Video capture and rider pose analysis as another data stream
- Find another skill to improve with sensor data
Here’s hoping to be back soon with an ML-powered 8 second wheelie!