On Heart Rate Variability and the Apple Watch

Got one?

EDIT 3: Nov 10th, 2018: while what follows still applies, there is now a reliable workaround to use your Apple Watch to collect meaningful HRV data points first thing in the morning, and interpret them. Please refer to this article to learn more.


If you are new to the world of Heart Rate Variability (HRV), you might want to read this post first, or browse through this deck.

As pretty much anything affects the autonomic nervous system, collecting and analyzing longitudinal data representative of vagal tone can provide insights in many complex mechanisms taking place in health and disease. However, historically HRV analysis has been poorly standardized, leading to difficulties in properly designing and implementing studies as well as difficulties in comparing studies outcomes.

The ease of access to HRV data today often obscures the complicated nature of understanding and correctly interpreting the information provided and underlying physiological processes. The need for accurate RR intervals, artifact removal, context, best practices, are often overlooked, especially in the consumers space. Therefore, the very nature of HRV itself may have led (or might lead in the future) to confusion for its use in applied research and by consumers that have access to these technologies today.

Unfortunately, the Apple Watch latest features seems to have added a little to the confusion. This post is mainly motivated by countless emails that I have received regarding the possibility to integrate the Apple Watch in HRV4Training, which clearly highlight some misinformation around the topic of both HRV and the Apple Watch itself.

Simply put, at this stage, the Apple Watch cannot be used for reliable and meaningful HRV analysis using third party apps.

Interested in learning why? Read on.

“Can I use my Apple Watch for HRV analysis as I do with a Polar or other sensor?”

Unfortunately, the Apple Watch cannot be used as a regular bluetooth sensor as you would use for example a Polar chest strap. Sure, the Apple Watch talks to your phone using Bluetooth, but that is not sufficient. The reason being that the Apple Watch does not comply to the standard Bluetooth low energy heart rate profile. What does that mean? Standards are introduced so that we can have interoperability between sensors, phones and apps manufactured or developed by different companies. When a sensor is released on the market, if the company implemented the standard Bluetooth low energy heart rate profile in the sensor, any app can talk to the sensor. This is the case for Polar, Wahoo and many other manufacturers out there. Others, don’t comply to standards, for example Fitbit or Apple, which means there is no way for third party apps to talk directly to their sensors, outside of the ecosystem they create.

In case of Apple for example, any interaction must be via the Health app, and it is not possible for any third party app to talk directly to the watch. This limitation has severe implications as the watch writes information to Health in a way that is outside of third party apps control.

1st issue: even if the watch were to compute HRV accurately and write it to Health, it would still be a problem in terms of user experience as a third party app cannot trigger an HRV measurement. Similarly, the Health app might update at a different time, with respect to when you’d like to use your third party app.

This being said, there are sensors that do not comply to standards but still provide a way to acquire either raw data (PPG) or beat to beat data (RR intervals) that could be used then by a third party app such as HRV4Training to compute HRV. This is not the case with the Apple Watch either, as no raw data or beat to beat data is accessible. Even when the sensor is in workout mode or using the ‘Breathe’ app (more on this later), and data is written to Health at higher frequency (a sample every few seconds instead of one every few minutes), these data is absolutely insufficient for HRV computation, it is not accurate beat to beat data, but simply averaged heart rate.

2nd issue: the basic unit of information needed to compute HRV, is not available at all.
Simultaneous recording using a Polar H7 chest strap linked to the HRV Logger app (our reference data) and the ‘Breathe’ app in the Apple Watch. In the screenshots on the right end side we can see how despite the manual HRV measurement triggered when using the ‘Breathe’ app, only two heart rates are reported for a minute of data (in the past I’ve seen a number of different outcomes, however, clearly no beat to beat data is present). The top plot shows how beat to beat data looks like during a breathing exercise, with clear oscillations in heart rate due to breathing in / out, this is the kind of data we would need if we were to compute other HRV features from RR intervals, which are however not present in Health.

“But my Apple Watch writes HRV to Health”

True, the Apple Watch started writing HRV data to Health when the ‘Breathe’ app is used and also at some other times during the day or night. A few things that should be discussed here are:

  1. What is the value written by the Apple Watch to Health (HRV is not a single number and can be quantified in many different ways, some more meaningful than others).
3rd issue: when looking at short HRV measurements (1–5 minutes), the only thing we can really capture is parasympathetic activity, which is quantified by features different from the one computed by the Apple Watch and available in Health — check out the links at the beginning of this post if you don’t know what I’m talking about, HRV analysis is simply a way to capture well known processes representative of physiological stress — when done properly.

2. What is the criteria, and what are the implications of lack of contextualization of this measurement (read: data is written at random times during the day instead of in a clear context, first thing in the morning, as I highly recommend if we want to make any sense of the data — more on best practices and guidelines for HRV analysis also available here).

4th issue: context is everything when we talk about interpreting physiological data, and here we are taking a step back due to lack of context.

Let’s dig a little deeper into these two points.

1. What is the value written by the Apple Watch to Health

Currently, the SDNN value is what is supported by Health (and computed by the Apple Watch as well). I personally believe this decision is mainly due to the fact that the medical community historically relied on the SDNN value when using HRV in the context of clinical diagnosis / medical applications or simply to stratify the population (e.g. for mortality risk). However, as always, we should first understand how and why SDNN was used, and how and why we use a different metric.

First of all, SDNN was used in the context of 24 hours measurements, so that we would get an understanding of cardiac variability changes throughout the day, as a response to circadian rhythm and acute stressors. It was mainly about distinguishing no variability at all (the inability of the system to react to any stressor, as it can happen in case of severe chronic conditions / disease) vs a healthy cardiovascular system — as SDNN mathematically computes the amount of variability in our 24 hours of RR intervals (beat to beat differences). This method allows to quantify macro-differences in physiology between specific medical conditions and healthy controls (between-individual studies). This method is also highly dependent on physical activity and other confounding factors that affect physiology during the day. Personally, I would speculate that most differences between groups detectable by SDNN over 24 hours are also captured by morning or night measurements (well contextualized resting physiology) in terms of clear markers of parasympathetic activity such as rMSSD or HF.

Example of use of the SDNN feature in the medical literature. Event-free survival of hemodialysis patients with higher and lower heart rate variability. captions: a) SDNN higher group showed significantly higher major adverse cardiac and cerebrovascular event-free survival than SDNN lower group. b) SDANN higher group showed significantly higher major adverse cardiac and cerebrovascular event-free survival than SDANN lower group. From this paper.

Things have changed a lot since those times. We can now acquire data in a known context (first thing in the morning) outside of laboratory conditions, so that the effect of confounding factors / external stressors is limited, and also easily improving compliancy. We finally moved from between-subject cross-sectional analysis (read: differences between disease and health in 24 hours of holter measurements from different people), to within-subject longitudinal analysis (read: more powerful analysis that allows us to track changes in physiology together with changes in health, a specific disease, physical performance, for an individual).

In the context of our assessment of baseline chronic stress using a well contextualized morning measurement of short duration — which is in my opinion the place where we should start, if we want to learn a little more about stress and physiology using HRV — it makes less sense to look at SDNN, and we should be looking at HRV features representative of parasympathetic activity such as rMSSD or HF.

In HRV4Training we use rMSSD as it is well established that it is a marker of parasympathetic activity, and therefore the lower the value, the higher the level of stress, relative to your baseline / past data (obviously, an oversimplification). From a human physiology point of view, this links to the fact that parasympathetic activity is mainly the activity of the vagus nerve. The vagus nerve acts on receptors signaling nodes to modulate pulse on a beat to beat basis while sympathetic activity has different pathways with slower signaling hence beat to beat changes reflect parasympathetic activity and can be quantified using rMSSD or HF (see Nunan et al.).

Unfortunately Health right now does not allow developers to write features other than SDNN, because that is what the Apple Watch computes and reports. As Apple has been improving Health and the Watch in the past few months, hopefully more HRV features will also be added in the future.

Please see Edit 2 at the bottom of the page for additional considerations on this point.

2. What is the criteria, and what are the implications of lack of contextualization of this measurement

As discussed, the Apple Watch does write HRV (SDNN) values to Health from time to time, but it is unclear what is the criteria (outside of the usage of the ‘Breathe’ app that seems to trigger a SDNN reading consistently) and it is also not clear what is the accuracy.

In general, I would discourage spot checks / measurements during the day, as they have very little repeatability and normally simply reflect some transient acute stressors and the effect of physical activity (even just walking to another room, will increase your heart rate and reduce your HRV for a few seconds / minutes depending on your fitness level and health condition, let alone drinking coffee, getting upset about your Facebook feed, etc.). There is a lot variability in HRV, way more than in heart rate, and therefore context becomes even more important.

Ideally, what you’d like to capture is baseline chronic stress, which is what you measure when taking your readings in a known context with limited impact of external factors, meaning first thing in the morning before eating, drinking, doing sports, reading your email, etc.

Baseline chronic stress will reflect major stressors such as ‘life stress’, hard workouts, travel, a little too many drinks last night, etc. — and quantifying it will allow you to better understand how your body is responding to your lifestyle, hopefully leading to the implementation of meaningful changes. This is the most meaningful practical application of HRV for the consumer today.

HRV4Training’s camera based measurement.

The case for Less is More

Even when beat to beat data can be acquired with high accuracy in free living, one single measurement in a well known context (first thing in the morning) is more valuable than recording more data at random times during the day or continuously.

Why is that? Consider that what we try to measure is parasympathetic activity, so the branch of the autonomic nervous system (ANS) in charge of rest, recovery and relaxation. The ANS is affected by pretty much anything (food, alcohol, coffee, stress — just think about reading something online and getting some emotional reaction in no time), hence measuring throughout the day, while potentially useful if properly contextualized (e.g. if you can link the physiological responses to what is happening, as similar responses are triggered by both physical — training or just walking around — and psychological — emotional or other — stressors, so physiology alone, without context, is really not useful), is typically simply a reflection of all the acute stressors we encounter during the day. If your interest is to measure your response to food intake or a session of meditation, this makes a lot of sense (even though even in that case, a better experiment design is to measure pre/post stressor, and analyze relative changes).

However, if you are interested in measuring underlying / baseline physiological stress to potentially make adjustments to your lifestyle or training plan, then you would end up missing that information or confounding it with whatever is happening in your day. HRV is highly influenced by acute stressors, hence the importance of the ‘morning routine’, measuring as soon as you wake up while trying to stay relaxed.

The inability of daily measurements to reflect underlying physiological stress was also shown recently in a paper authored by Ricardo Mesquita, where the authors concluded that ‘Analyzing RMSSD from daily routine activities was not reliable, and therefore validity cannot be assumed. RMSSD should therefore be calculated from RR intervals recorded in standardized conditions, such as during the OT upon awakening.’ — this is even assuming that rMSSD can be captured correctly during daily activities, which is not to take for granted when using PPG based devices (more on artifacts, later). Paper here.

Alright, a few more words on technology and artifacts.

“It is not possible that the Apple Watch can’t measure HRV accurately and you can do it with the phone camera”

The Apple Watch might be measuring HRV accurately. I really hope so and we would love to use it with the technologies we have developed. In some of our comparisons with Polar chest straps, the Apple Watch as a matter of fact showed great agreement in terms of measured SDNN while using the ‘Breathe’ app (in the example above, the Apple Watch reported SDNN 131 ms, while the Polar + HRV Logger reported 137 ms, considering the two streams are not even properly synchronized, this looks very good — see Edit 1 at the bottom of the page).

However at this point it should be clear that the data is not in the required format for us to compute the features we use, at the time we need them. The problem is not ‘being a watch’ or ‘using optical measurements’, but simply being locked in a way that prevents the type of usage we would need, if we want to provide value based on these data, and not just numbers or pretty plots.

At this stage HRV4Training allows you to measure with the camera (a validated measurement) as well as regular sensors that comply with standard protocols, including the only wristband we could find that can actually report accurate RR intervals under certain circumstances.

Simultaneous R-R interval of an individual subject during 60 seconds of recording for photoplethysmographic (PPG), Polar chest strap (H7) and electrocardiogram (ECG). It is possible. Paper here.

As the app is used for a morning measurement in which the user is completely still, it is by definition used in the optimal setting for optical readings, while any wrist movement or other issue while the Apple Watch is measuring would potentially create artifacts.

Artifacts?

Yeah, artifacts.

Two consecutive minutes of ECG data. The second minute includes one single ectopic beat. rMSSD for the first minute of data is 79 ms, for the second one it’s 201 ms, a huge difference considering that nothing has changed in terms of parasympathetic activity. Note that this is ECG, and all sensor modalities are affected by artifacts, sometimes of different nature, that need to be dealt with, if we want to make sense of the data.

HRV data is highly affected by artifacts, either in the measurement device (wrong beat detected, movement for PPG sensors), or in the actual data (ectopic beat, arrhythmias) that need to be handled properly.

It is unclear how the Apple Watch deals with artifacts and at this stage there is no way to understand if there were any, as no data quality metric is reported. I’ve argued elsewhere (here) for more transparency especially when dealing with physiological data, as almost no sensors out there provide indications of the confidence level they have on the quality of the data reported, which has implications for anyone relying on such data for higher level analytics.

A clever company, Cardiogram, exploited the fact that the Apple Watch cannot report heart rate correctly when an arrhythmia is present (we are talking about average heart rate here, a much simpler metric to measure, not variability), to identify healthy individuals and individuals with arrhythmias.

While the watch abilities and limitations can both be exploited by higher level applications aiming at clustering parts of the population in different groups based on health related metrics / risk stratification, this is an issue if we are in the business of trying to reliably and accurately measuring HRV of an individual, using the Watch as it is today.

Apple Watch data for a person with Atrial Fibrillation, taken from the Cardiogram Blog, at this link. As you can see, there are entire minutes with no heart rate data or erratic data, as the Apple Watch fails to measure correctly under these circumstances. This is obviously not what Atrial Fibrillation looks like, however this is an issue if we want to measure HRV reliably using this device, as not only data is incorrect, but it is also not reported as such.

So what?

I believe Apple is heading in the right direction, and at HRV4Training we keep looking at the data as well as at other possibilities to use the Watch, so hopefully this is just a matter of time. However, right now it is clearly not possible to integrate the watch in the app.

I hope you found this read somewhat useful and it will help you make the most of available technologies.

There is a lot to learn from being a little more aware about our physiology and of how we respond to different life stressors, however it is really important that these measurements are accurate and properly contextualized.

Get measuring. No sensors needed.

EDIT 1: Sept 28th, 2018: Hernando et al., in a recent paper titled “Validation of the Apple Watch for Heart Rate Variability Measurements during Relax and Mental Stress in Healthy Subjects” show that indeed RR intervals from the Apple Watch are very accurate, which is fantastic news. What discussed in this post however still applies, hence the only way to get the RR intervals is via the Breathe app, no other app can access these data or take a measurement, unfortunately.

From the paper: “Apple does not include any programming method for developers to directly access the values. This app (Breathe) stores the raw RR values, with a precision of centiseconds, in the user’s Personal Health Record, accessible to be exported in XML format using Apple’s Health App

EDIT 2: Oct 15th, 2018: due to the fact that things do not seem to be changing any time soon in terms of access to raw PPG data or even just RR intervals, which means we (as developers) are stuck with the SDNN value written in Health, I have done quite some work to better understand its effectiveness in the context of short term measurements taken first thing in the morning and capturing chronic stress the way we would do with rMSSD. Results are quite positive, and a broader discussion on this aspect can be found in this blog post: https://www.hrv4training.com/blog/heart-rate-variability-hrv-features-can-we-use-sdnn-instead-of-rmssd-a-data-driven-perspective-on-short-term-variability-analysis

Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering and has published more than 50 papers and patents at the intersection between physiology, health, technology and human performance.