Data Interpretation Issues in Wearables
tips to make good use of your data
More and more wearables have started capturing Heart Rate Variability (HRV) data overnight and combining it with other parameters (e.g. heart rate, sleep data, physical activity) to provide readiness or recovery advice to the user.
Despite some inconsistencies over the past years, as of the end of 2022, Oura, Whoop, and Garmin all work in a very similar way when it comes to HRV measurement, which I discussed in depth here. However, the way the data is used in all of these tools when building readiness or recovery scores or when providing advice to the user is often problematic and inconsistent.
In this blog, I will cover data analysis and interpretation to provide you with some useful tips and tools that should allow you to make the most of the collected data and ignore inaccurate interpretations provided by most tools out there.
Thank you for reading.
Please feel free to reach me (@altini_marco) on Twitter for any questions or follow-ups on this blog
What are the main issues?
In this blog, I want to try to keep my criticism broad and use this space to help you better understand why the typical interpretations provided together with these wearables are problematic. While these might be well-meaning companies (?), there are some common flaws in how the data is used, which are key if we want to make use of it either personally or for the athletes we coach. Especially in athletic contexts, readiness or recovery scores hide the information you want to see (the actual response).
My goal is to get you to think critically about the tools you are using and how the data is interpreted so that you can use the data more effectively.
In particular, these are some of the most common issues:
- Some tools rely on the naive “higher is better”, interpreting a high HRV always as a good sign and a sign that you should be smashing it on a given day (Whoop is the prime example here). This is not how physiology works.
- Some tools do not have a way to represent your normal range and to allow you to understand if a given daily change is irrelevant (a bit lower or higher than yesterday but within your normal range) or if it is a more meaningful change that should be taken differently (neither Oura or Whoop meaningfully represent your data).
- Some tools provide cumulative readiness or recovery scores, which give you the false impression that the score can better reflect your recovery or readiness since more data is aggregated, but in fact, confound how your body has responded with your behavior, and make the data less useful (all wearables went down this problematic route).
Below I look at all of these problems in more detail.
Higher is better is not how physiology works
Some tools rely on the naive “higher is better” interpretation when it comes to HRV. In these tools, a high (or higher) HRV is always interpreted as a good sign and a sign that you should be going hard. This is not how physiology works. In fact, quite the contrary.
Even when looking at exercise data, an acute drop in heart rate often signals fatigue, despite the fact that a chronic reduction, over weeks or months, signals increased fitness. This is the case also for resting heart rate, and similarly, for HRV. A very high HRV can happen in situations in which your parasympathetic system is active not as a sign of readiness, but as an attempt to recover from a prior, large effort. Acute and chronic responses in physiology differ, and fatigue states can often resemble optimal states, but on a different time scale (i.e. acutely, not chronically).
Below is an example. As Gene mentions, he had a rough day and an accident, which triggered an acute response resulting in a very suppressed heart rate and high HRV. HRV4Training flags it in yellow, advising caution. Another tool would have said the opposite. Use a good tool.
This is why it is nonsense to interpret HRV (or any other physiological signal) just in one direction: blood pressure, blood glucose, etc. — all have a normal range. Similarly, HRV has a normal range, which is specific to you, as I discuss below.
Some tools (one in particular) even provide you with a yellow score when your data is within your normal range, and give you a green only when your HRV is higher than ever. This is the worst possible use of HRV data I have seen on the market.
Beware of tools reporting naive, higher is better, interpretations
Lack of a normal range: detecting meaningful changes
On top of the issue discussed above, HRV data has an inherently high day-to-day variability. This means that there can be large fluctuations between consecutive days, which is different from parameters that you might be more familiar with.
What are the implications? To make effective use of the data, we need to be able to determine what changes are trivial, or just part of normal day-to-day fluctuations, and what changes do matter and might require more attention or simply truly represent a positive (or negative) adaptation to training and other stressors.
Here is where pretty much any software out there fails. They show you a number for today, and you can look at your previous numbers, but then what? is my HRV lower because of a serious stressor or is it a bit lower just because of normal day-to-day variability? We need to determine (and show you) your normal range. This is something we have spent a lot of time researching and designing in HRV4Training, starting with the way the daily advice is built.
Below is an example showing my last three months of HRV and load data, characterized by good stability, with a few major events:
- food poisoning ↓↓↓
- heat response ↓↓
- positive response to high load, cooler weather ↑↑
- taper ↓
- post-race sickness ↓↓
- life stress (travel, talks) ↓↓
The normal range is the light blue band you see across the first plot (screenshot from HRV4Training Pro), and makes it easy to capture acute suppressions and long-term changes. Without this level of context, there would be very little readability:
A software that interprets any HRV increase as a good sign, or any HRV decrease as a bad sign (or simply can’t even interpret the change for you), is failing to correctly represent the fact that there are normal variations in physiology, and that only variations outside of this normal range, should trigger concern or more attention or simply be interpreted as actual changes.
Using a normal range solves many problems at the same time, avoiding the pitfalls of the higher is better interpretations as well, since a particularly high score is also flagged as something to be cautious about.
Issues with cumulative scores (readiness, recovery, etc.)
When providing daily advice (color-coding and message) in HRV4Training we combine your physiology and your subjective feel (outputs). However, we do not use or include your behavior, for example your activity / training (input). This is a key difference from what you get in terms of readiness or recovery scores in wearables.
Why is that? The whole point of assessing your state, either objectively via heart rate variability (HRV) or subjectively by feel, is to determine how you responded to your given circumstances. You already know the input (behavior) and are assessing the output (physiology or feel). In other words, if I train hard or more for a few days, I want to assess how I responded (output). Including activity (input) in my assessment would mean penalizing me regardless of my body’s response.
For athletes (of any level), this method is particularly ineffective: it hides information. If you train, there is no point looking at readiness or recovery scores to assess how you are responding to a given training stimulus as these scores confound your response with your behavior. Is the score low because I responded poorly, or just because I did more? (check out this example here). This approach not only provides you with poor information about your actual response, but fools you to believe the tool works. You go hard or do more, and they tell you you need to recover. In fact, you might be doing very well and be ready for another big training block.
Let’s look at another example: below is the night before an important race, characterized by terrible sleep. However, an “acute sleep fuckup” has no impact in most circumstances, and we can still perform at our best (my readiness score was 56). Interestingly, my morning measurement resulted in a very high heart rate (this is the sympathetic system kicking in pre-race), but with good HRV (high modulation despite high heart rate). Check out this blog if you are interested in learning more about the differences between morning and night data.
This is not to say that your behavior does not matter: it is key context you can use to understand what could be driving changes. However, it should not be used to determine your response (output). You want to learn about the output of the system (physiological or subjective response) given the input (behavior and other).
There are many nuances that are worth understanding a bit better if we want to make good use of available technology. Hopefully, this explains a bit why it is worth assessing your physiology and feel, while you can ignore most (all?) made-up scores.
Check out the blog and podcast below, for a more in-depth analysis of these aspects.
How to get a proper interpretation of your physiological data for any wearable
Given all the points above, we have designed HRV4Training to work differently from most tools out there.
In particular, data in HRV4Training is always interpreted with respect to your normal range, which automatically solves two issues:
- an abnormally high value can be flagged as something to be more cautious about
- day to day differences (e.g. a value a bit lower or higher than yesterday) are easily put into context so that you know when a change is meaningful and when a change is just part of day-to-day variability and nothing to worry about
In the homescreen of the app you can easily see for example your daily HRV with respect to your weekly baseline and normal range (and we do the same also for heart rate). The normal ranges are built using the previous 2 months of data, and allow you to quickly understand your current physiological response.
Simply put, HRV4Training is the only platform that provides you with an analysis of your physiology that matches how this data is used in state-of-the-art research and applied practice.
This means analyzing your resting physiology with respect to your normal range and providing you with feedback regarding your acute (daily) and chronic (weekly) physiological state, in response to the various stressors you face.
On top of this, the messaging (how the numbers are translated into words or advice), also accounts for how HRV should be used, combining outputs (physiology and feel) and not including inputs (behavior).
How to use HRV4Training’s interpretation with your wearable
The easiest way to track your night HRV in HRV4Training is to use Manual Input and enter it as part of the morning questionnaire, together with your subjective feel and other contextual data.
You can learn more here: https://www.hrv4training.com/blog/manual-input-in-hrv4training
If you are already using HRV4Training, before you go and buy a wearable, read the next section, as you most likely do not need one.
Do you need a wearable?
I have discussed in depth here and here how you can capture acute and long-term stress responses either in the morning and in the night.
Hence, my recommendation when someone is interested in measuring their resting physiology is the following: pick a sensor and routine that works for you.
If you prefer to wear something overnight, by all means, get a device that does so. If you prefer not to wear something during the night, charge it, pay extra, etc. and have a morning routine that allows you to take 1 minute to measure your resting physiology, then go that way. If you are not sure this is for you, you can use your phone camera and invest as little as 10$ in measuring your physiology daily and accurately.
There is no clear advantage in using one method or the other, however, there can be some differences based on stressors’ timing and other aspects that I discuss in-depth in this blog post. Make sure you understand these differences before picking one.
Personally, I keep measuring in the morning while sitting, and also collect night data using the Oura ring. Among the wearables, I trust Oura more in terms of data quality and also find the ring more comfortable in the night, since it does not need to be worn tight like other wrist-based wearables.
In terms of actionability, I normally rely on the morning measurement, as the orthostatic stressor provides a better assessment of my readiness for the day, while I keep an eye at night data as an assessment of the previous’ day behavior. In most occasions, the two are well aligned.
Recap
Despite the improved accuracy in measuring HRV in most werables, the way the data is used in the software provided with these sensors is often unable to account for some of the most important issues.
Naive interpretations (higher is better), lack of a normal range (what’s a meaningful change?), and confounding your physiological response with your behavior, are all common issues that limit the utility of the data.
In this blog, I have covered how we address these issues in HRV4Training so that the data is more meaningful and actionable.
I hope you have found this blog useful, take it easy
Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.
He has published more than 50 papers and patents at the intersection between physiology, health, technology, and human performance.
Marco is the founder of HRV4Training, a data science advisor at Oura, an Editor at IEEE Pervasive Computing (Wearables), and a guest lecturer at VU Amsterdam.
He loves running.
Twitter: @altini_marco