How to Assess Your Fitness Without Any Test

Using Detailed Activity Data to Continuously Track Your Performance

When I started running several years ago, I was surprised by the lack of relevant feedback about my fitness in the existing applications that could process my activity data. Supposedly (according to the following tweet from some authority in exercise physiology), this is not something trivial:

Let’s try to begin with drafting some requirements for performance metrics from the experience of the typical “moderately trained (amateur) athlete.” These athletes, in general, do not have routine access to laboratories with fancy testing equipment — fitness should be determined using commodity products such as GPS watches and heart rate monitors. Also, being amateurs, they do not have the opportunity to perform some time-consuming testing protocol on a regular basis — they would prefer to use their limited resources training for some event, favoring implicit testing procedures. Finally, they want their fitness assessment to be up to date, where possible based on their latest workouts. Moreover, it would be nice if our performance metric would be able to accurately predict our finish time for that big race we are preparing for.

These requirements are not just made up, but based on interviews with several runners, cyclists, and triathletes. Given all the current fitness metrics used in science and commercial products, why would we need something different?

Is One Dimension Enough?

Most of the current metrics such as VO2 Max[1], Functional Threshold Power (FTP)[2], or any of the proprietary fitness indices used by existing applications, have in common that they have only one dimension. Although these measurements may be highly correlated to performance in general, they are usually poor in predicting an individual’s performance for a specific event. Let’s clarify this using an example.

This chart uses data that is random but based on statistical properties reported by Souza et al. [3].

When 100 moderately trained athletes with varying VO2 Max values would run a 10k race, the results would look roughly as in the chart above. For this whole group, the data show a very significant (p < 1.0E-16) relation between VO2 Max and finishing time. However, for an individual with a VO2 Max of 67, the predicted time will be somewhere between 34 and 41 minutes, which is the difference between actually winning the race or ending in the bottom 20% of contestants.

The problem with VO2 Max and other single-dimensional metrics is that they can only partially explain performance. Besides, as shown, for example, by Souza et al. [3], it depends on the type of race/event how well these metrics can predict performance. FTP represents the maximum amount of power a cyclist can produce for one hour. Obviously, this will accurately predict the performance of a 1 hour time trial on a track. On the other hand, it is questionable whether FTP can predict an individual’s result in a long endurance event such as a cyclosportive or Gran Fondo, or her sprinting capacity when participating in criteriums.

Recently, our “colleagues” at The Sufferfest also recognized these shortcomings of single-dimensional metrics and changed their product to use power thresholds in four dimensions[4].

With or Without Relation to Effort?

The use of power meters in cycling gave rise to popularity of several power-based performance measures, such as the previously mentioned FTP. The commonly used method for determining FTP is by doing a 20-minute time trial as fast as possible[2]. Besides the already stated issue with FTP being single-dimensional, such a test based on a person’s perceived maximum effort is not too reliable, especially for beginning athletes. Given that you “improved” your FTP between two such tests, was that improvement because:

  • you managed to distribute your effort more efficiently over these 20 minutes?
  • you mentally adapted to endure more pain and just pushed a little harder?
  • you became physically fitter?

Another limitation of such maximum-power-for-a-given-time metrics is that they only measure performance at maximum efforts. It is doubtful whether the average amateur athlete is willing to regularly assess his fitness by going through these highly uncomfortable tests.

Testing Fitness in Multiple Dimensions and Relative to Effort

There is a test that relates performance to the level of effort and uses multiple dimensions to express fitness[5]. It assumes that an athlete’s maximum heart rate (HRmax) is known. The protocol involves running 6-minute intervals on a track with distance markers, using 2 minutes of rest between these intervals. The athlete runs every interval at a fixed level of effort, the first at a heart rate of HRmax — 50, the second at HRmax — 40, until the last, at HRmax — 10. The distance traveled during 6 minutes can easily be translated to a speed expressed in km/h by multiplying it by 10. Successive test scores can then be compared over time to get an overview of the athlete’s fitness improvement:

Example results from successive tests

It is important to know that the heart rate zones used for this test are just for testing the fitness at levels of effort relative to a person’s maximum heart rate. They do not represent any physiological trait of the athlete (such as aerobic — and anaerobic thresholds), and they are useless as target zones in workouts.

Given that we still need to go to a track and follow this specific protocol to measure our fitness, would there be a way to measure similar properties (as in speed per heart rate zone) without performing an explicit test? It would be great if we could get the same knowledge about our fitness progress from our daily training.

Continuous Fitness Testing

Modern sports tracking devices usually measure your heart rate, speed, elevation, and power (if available), every few seconds. As a consequence, every hour of activity data contains up to a few thousand of such measurements, each of them relating a heart rate to a speed or power at a given time. Can we directly reconstruct our performance (expressed in speed or power) from this data? Let’s take a look at one of my favorite workouts, 200-meter repeats at relatively high speed:

This chart is interactive; you can hover to see more details or click-drag to zoom in.

Assuming I did this training on a relatively flat course (filtering out climbs and descents is not too hard with elevation data present), it is easy to see that most of this data is useless for relating heart rate to speed. The main problem is that heart rate reacts to changes of effort with a delay. At the beginning of every interval (for example point (1) in the chart), it just takes a few seconds to reach the target pace, while the heart rate is still low. Moreover, during the brief walk, while recovering from the effort (point (2)), the heart rate is still higher than at parts of the interval where I was running at nearly full speed.

However, there are plenty of opportunities to find measurements that resemble the conditions of following the test protocol mentioned above. Every fragment of a workout that has a relatively steady heart rate (again, assuming filtered out climbs and descents) is usable, as shown in the example of a moderate tempo run with some accelerations:

Combining such “steady state” fragments from multiple recent workouts, and making groups of observations with similar heart rate zones, allows us to take the average speed or power for each zone. The resulting overview of heart rate zone versus speed/power is quite similar to the protocol test results shown before. The first version of this analysis is now available at Fithaxx:

Accuracy

During the second half of a recent 10k race my heart rate strap somehow loosened up so much that it slid down to my belly. As a consequence, it started to record heart rates of about 230. After the data from my device was synchronized, I got a congratulatory message from one of the most popular training apps for reaching a new peak heart rate.

Heart rate straps sliding down, shirts flapping in the wind, dry skin, or electrical interference can all cause misreadings from a heart rate monitor, ranging from the spectacular to the very subtle. The “continuous testing” approach of reconstructing performance from activity data can only work if the following types of heart rate noise are carefully filtered out:

  • obvious out-of-range heart rates (where the “normal” range, of course, is personal to a specific athlete),
  • sharp spikes and drops,
  • “flat line” measurements, no variation at all for periods longer than 1 minute;

Conclusion

We can reconstruct a fitness performance measurement from recent activity data using only computational means and no intrusion into an athlete’s training schedule. Its accuracy and power to predict race times still need to be validated, so stay tuned if you are interested in the follow-ups.

If you are an athlete and are using Strava to record your activities, please consider signing up at fithaxx.com to try out our continuous fitness testing.

Notes

  1. VO2Max definition at Wikipedia
  2. Functional Threshold Power explanation at BikeRadar
  3. Souza, Kristopher, Ricardo Dantas de Lucas, Talita Grossl, Vitor Pereira Costa, Luiz Guilherme Antonacci Guglielmo, & Benedito Sérgio Denadai. “Performance prediction of endurance runners through laboratory and track tests.” Brazilian Journal of Kinanthropometry and Human Performance [Online], 16.4 (2014): 465–474. Web. 18 Jan. 2018
  4. The Sufferfest 4DP
  5. Such a test is mentioned in:
    Zoladz, J.A., Sargeant, A.J., Emmerich, J. et al. Europ. J. Appl. Physiol. (1993) 67: 71. https://doi.org/10.1007/BF00377708