On Heart Rate Variability (HRV) and readiness
The goal of this post is to provide some clarity and general considerations on heart rate variability (HRV), readiness, recovery, or stress scores (just readiness from now on, for simplicity), and wearables. I will try to clarify why comparing HRV and readiness scores is of little use and what you should be comparing instead (if anything) for a more meaningful assessment of how these devices work. Most importantly, we will see how you can benefit from the data for both HRV and readiness. They both serve a purpose, but should not be mixed up.
What are we talking about?
HRV is a measure of physiological stress. For today’s wearables and apps, it typically represents parasympathetic activity due to how it is measured (at rest, while sleeping, or first thing in the morning) and computed (relying on high-frequency changes captured by rMSSD). This means that a lower HRV with respect to your historical data, is associated with higher stress.
Readiness is a made-up construct that most apps or wearables provide. The goal of readiness is to combine multiple parameters (one of them typically is HRV), to determine your level of recovery or ability to tackle the day (whatever that means in your case).
I’ve discussed more broadly the issues covered in this blog, in the podcast below with Jason Koop and Corrine Malcom. Check it out.
Why does this matter?
Due to the novelty of some of these metrics for consumers, issues in science communication, and whatnot, there is much confusion on either of them, to the point that often I see people comparing HRV from one wearable with readiness from another. While understandable (the tools are supposed to do the same thing, measure our recovery), this is like comparing apples with pears, it does not make much sense.
This is an important aspect to address because wearables and apps can be extremely helpful in better understanding physiological responses to the various stressors we face, but not all devices are equal, nor differences between the output of one or the other device necessarily mean that they cannot be trusted.
Why do some wearables and apps provide a readiness score?
Wearables provide a readiness score for a simple reason: they track many parameters and try to break down that information into something more digestible for the consumer, which means generating a single readiness score (and normally some form of color-coding).
Obviously, they all mean well. Oura, Whoop, etc. — try to use the available input to give you an overview of what the data shows. For example, your HRV is low, you were very active for a few days, your sleep was poor, thus your readiness is low. The basic principle makes sense.
Comparison between tools 🍎🍐
The first limitation of readiness scores typically highlighted by people using more than one tool is that they are inconsistent. Note that this is per se not a problem if we understand what the tool is doing. Not only the algorithm, but even the inputs can differ (e.g. Oura has temperature data).
If we use a certain tool, we can see how readiness over time changes based on various stressors and behaviors, and we might benefit from the feedback. However, the black-box nature of these scores makes it more difficult to make informed decisions.
If you are comparing tools, you can go two ways:
- Compare readiness scores from different systems: as mentioned above, there might be inconsistencies, due to the type of input (having or not having a certain sensor) how the same inputs are measured (using the whole night of HRV vs more noisy segments of the night), the weight given to certain inputs (e.g. sleep or activity being more important in one tool or the other). Due to what I discuss below in the limitations section, this comparison is in my view of little use. Readiness can work for you, but it does not have to match another device output and there is no gold standard.
- Compare physiological data collected from different systems: this is where we want to have fewer inconsistencies and ideally the same trends. Resting heart rate, HRV, temperature, they need to show the same acute changes and chronic trends across wearables and apps, if we want to trust these tools and rely on them to assess individual stress responses and better manage our health and performance. This is why it is important to understand what we are measuring, and also what are the nuances of doing it a bit differently (e.g. morning vs full night vs a few minutes of the night for HRV). Even in this case, it is fine for the data to differ if we understand why that is the case (e.g. a late stressor having a different impact on morning vs night data or using 5 minutes of data collected during the night being a more noisy version of using the full night).
To summarize, if you are comparing different tools, please compare the actual physiological signals, ideally with respect to your normal values and in relation to acute and chronic stressors (learn more, here). Otherwise, you might just be comparing meaningless oscillations (normal day-to-day variability, which can be high for some of these physiological parameters).
Limitations of a readiness score
I’ve mentioned above how inconsistencies between readiness scores are often reported. Let’s try to touch on some of the more nuanced issues. For example, the most important question for me is always the following: did my behavior or my physiology trigger a reduced readiness? Did I get a low score because my HRV was low or because yesterday I was very active? In my opinion, what we should care about is our body’s physiological response, and that is what HRV captures. This is why in HRV4Training we don’t create a score combining multiple parameters, the score is HRV, which is then contextualized with respect to your historical data and other parameters.
If for example, your readiness score includes sleep, then if you have a poor night of sleep, reflected also in your lower than normal HRV, you end up penalizing twice your readiness score (low sleep quality and low HRV). If on the other hand, your HRV is fine, it means your body did not respond poorly to a disruption in sleep, and therefore we might not need to penalize your readiness because of sleep. This is just an example, but you can think of any stressors, e.g. if you had a hard workout yesterday and your body assimilated the stressor well, meaning your HRV is within your normal, then it would be a poor choice to penalize your readiness score using activity data. Certainly, this method can give you the perception that readiness scores are meaningful, you went hard and readiness is low, but if there is a systematic impact of physical activity (or other stressors) on readiness scores, unrelated to your physiological response, then why do you even measure your physiology? Once more: if I go for a hard run, the day after I want to see how my body dealt with it (my HRV), I do not want to get a readiness score that penalizes me for having done a hard effort or having slept a bit less. Obviously, if I responded well, I will still take my day off training, that’s what periodization is about.
In other words, we measure physiological parameters representative of recovery (HRV) but then we estimate the effect of other parameters (e.g. activity, “sleep quality”, etc.) on recovery to compute a readiness score that should be cumulative of everything. The reality of course is that this can never be accurate. Even in an ideal world where activity, sleep, and other parameters are correctly quantified there are so many other factors that will have an impact beyond what a wearable can measure (environmental factors, medication, diet, personal relationships, global pandemics, just to name a few).
Finally, often I feel like there is a mismatch between the good intentions of a readiness score and the target audience. If the user is an elite athlete, it is even more meaningless to spend time looking at made-up metrics. As a professional (athlete or coach), it should be obvious that the physiological response is what matters. If you do not know where to start, check out the guides I link at the end of this article, I can ensure you that you are not alone, and dismissing technology or data is hardly the smarter approach.
In defense of readiness
Alright, I just gave a bit of a hard time to the whole concept of readiness. I wanted to make it clear how it is constructed, how it differs from measuring HRV, and what are the limitations. It is of course not all bad and readiness can certainly be used for guidance in certain cases.
An important assumption I make throughout this post is that we know what HRV is and how to use it. Or in other words, we understand that when measured correctly, HRV reflects physiological responses to stressors. This means that a good HRV is defined as a stable value with respect to our history, not a “higher value” (see this). Similarly, a good HRV reflects a positive response to training and lifestyle stressors currently present in our life. Most importantly, a positive response does not mean “train hard every day” but it means “proceed as planned” because you do need a plan. This is not trivial, as only recently we have better understood how to collect and interpret the data meaningfully, and how to better communicate these aspects.
Blind guidance from a wearable or app without a plan is why readiness exists. The score tries to include various aspects of your life (activity, sleep, HRV, etc.) so that the app can do the decision-making for you. This of course makes sense as we do not all have a coach or the time or energy or knowledge to look at the raw data and interpret it. It is easier to look at a cumulative readiness score than to look at physiological data (heart rate and HRV) and at how physiology changes in response to the various stressors you face.
If you are able to link an app’s or wearable readiness score to how you feel subjectively and/or the stressors you face, over time, by all means, that is a useful way to use the data. It could be that in your case the inputs and weights used by the algorithms reflect well your responses. Below I discuss why we do not do this and possible alternatives to make good use of your physiological data so that you can interpret deviations from your normal that signal periods of higher stress.
Alternative approaches to readiness: just use HRV
While I understand the reasoning behind the readiness score, in my opinion, these scores are flawed because starting from a true physiological response (HR, HRV, etc.) we then confound it with behaviors and estimates (e.g. activity or sleep quality as well as other parameters “we think” might be relevant). These parameters are not all equal.
If you adjust your activity or sleep duration on some of today’s wearables, your readiness will change. To me, this is a red flag given the inability of such wearables to measure accurately either sleep or activity. Note that even if they were perfectly measured, physiology already reflects what you need to know.
All other parameters (sleep, activity, etc.) are essential as contextual information, to understand how your physiology changes in relation to sleep or exercise habits for example, but this is different from using them directly to determine your ability to perform on a given day.
The holistic view (provided by a wearable) is a myth. A wearable has no idea of muscle damage and context. Sleep quality is inherently linked to night physiological data, etc. Not only the wearable or app is missing information, but aggregating information gives the false expectation that the data becomes somewhat more insightful, while it is simply diluting the insight.
In my opinion, looking at actual physiological data and how it deviates from your normal, and contextualizing such data with your subjective feeling, training data, etc. separately can be more helpful.
This is the beauty of HRV: as an overall marker of stress, you do not need to be able to quantify all sorts of things to estimate it (the way readiness works). HRV already reflects how you are responding to whatever you do.
For these reasons, if you have to pick a parameter, pick HRV.
Wrap-up and resources
In this post, I tried to provide an overview of the differences between HRV and readiness. If you use different tools, pay attention to how the actual physiology (output) changes across tools, and worry less about readiness or scores that include behavior (input), especially if you use these devices with clients or to manage your health or performance.
To learn more about how to use HRV and how to interpret the data with respect to your historical measurements and various stressors, check out my guide here.
To learn more about the differences between resting heart rate and HRV, why HRV is a more sensitive metric of stress and what are the implications in terms of the technology used to measure it, check out my other guide here.
The Ultimate Guide to Heart Rate Variability (HRV): Part 1
Measurement setup, best practices, and metrics.
Resting Heart Rate and Heart Rate Variability (HRV): What’s the Difference? — Part 1
Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.
He has published more than 50 papers and patents at the intersection between physiology, health, technology, and human performance.
Marco is the founder of HRV4Training, data science advisor at Oura, and guest lecturer at VU Amsterdam. He loves running.