Artifact Removal for PPG-Based Heart Rate Variability (HRV) Analysis

A step by step guide

Marco Altini
The Startup
9 min readSep 26, 2020

--

HRV4Training is the first validated platform able to provide reliable HRV measurements using the phone camera.

Artifact removal is probably the most important and (unfortunately) most overlooked step of the signal processing pipeline required to compute HRV features

While all beat to beat data should go through artifact removal (even when collected with ECG or chest straps, as ectopic beats would still be present under these circumstances, see an example here), the issue becomes particularly important for PPG measurements, as they are more prone to noise (which means that it’s easier to mess up the signal, just by moving)

The issue of artifact removal is particularly important for HRV analysis. In particular, even a single artifact over a 5 minutes window, can have very large consequences in terms of the derived HRV features (we’ll see an example in a minute). Camera-based apps, watches, rings, wristbands, are all affected by these issues as soon as you move

Thus, if our goal is to correctly compute HRV features in healthy individuals (i.e. individuals without cardiac issues), we need to do our best to clean the beat to beat intervals from any artifacts, regardless of their origin (actual ectopic beats, or issues because the user was moving)

How bad is the problem?

Really bad

This is why 99% of smartwatches on the market do not even bother with HRV analysis and target only heart rate estimation, which still doesn’t really work consistently when exercising (if you are serious about your exercise heart rate, please get a chest strap)

The few devices that do go through the trouble of doing HRV analysis, normally do so while you sleep (e.g. an Oura ring or Fitbit), or using a specific sensing modality (e.g. the Scosche Rhythm24 HRV mode or the Apple Watch using the Breathe app). This choice makes a lot of sense as if you are sleeping or doing a breathing exercise, you are not moving that much.

Additionally, given the limited utility of HRV analysis during exercise, as long as you are able to collect high-quality data at rest or during the night, you are good to go (you can learn more about heart rate variability and when to measure, in our guide here).

For phone or camera-based measurements, similar to the ones we use in HRV4Training or HRV4Biofeedback, issues are typically caused by finger movement, as the apps are used while at rest, and therefore there is no body movement

Let’s look at one example:

Above we have one minute of PPG data, including detected peaks. In general, the data shown here is good quality, however, there are some clear artifacts (e.g. in the second row, causing a spike and abnormal gap between beats)

During this test ECG data was collected simultaneously, used to extract reference RR intervals, and compute rMSSD, which was 163ms. If we use the PPG data and detected peaks we have here to compute rMSSD, we get 229ms (which is a large difference for this metric, repeated measures are in the 5–15ms difference range).

The few artifacts present have a large effect on our output metric, and therefore we need to address the issue or the data collected will be rather useless. Note that this problem normally does not affect resting heart rate (60 beats over a minute are still 60 beats even if a couple of them are out of place, hence this is key only in the context of HRV analysis normally)

Basic artifact removal in three steps

There are many different methods that can be used to remove artifacts. Something that I found to be effective when looking at data over a broad range of HRV values and PPG-related issues, is the following:

  1. Remove extreme values (range filter, typically anything that does not result in an instantaneous heart rate between 20 and 200+ bpm, depending on the application, e.g. resting physiology or exercise)
  2. Remove beat to beat abnormalities. This means removing beat to beat differences that for example are more than X%, which is not physiologically possible. X should change based on the actual baseline HRV of the person, as the common thresholds (20–25%) can overcorrect. Overcorrection tends to be a minor problem for nonathletes but should be accounted for in a population with particularly high HRV values
  3. Remove remaining outliers. After the previous steps, we could still have some outliers, especially if we are less strict with the abnormalities filter (say we use 50–70% for athletes, then there will be more artifact that we actually need to remove). For this filter, I found (empirically) the following thresholds to work well: 0.10–0.25 * 25th and 75th percentiles of the clean data.

In our apps, we use these methods plus a few extra steps that can be feature-dependent, or person-dependent, as well as optimized thresholds based on the person’s historical data and group-level parameters. However, in almost all cases, what is reported above is already sufficient, as we will see in the validation below

Let’s first look at our example, we can see here in yellow the valid peaks after artifact removal:

Lets now look at the PP (and RR) intervals. PP intervals are the beat to beat differences computed after detecting individual beats in our PPG (or ECG, called RR intervals in this case) data. When we visualize PP intervals over time, normally we can spot easily any artifacts (spikes) as well as any other issues, since the time series should look very similar between sensing modalities (phone camera, chest strap, or ECG).

In the figure below, we have in the top plot our camera-based PP intervals (in dark blue before artifact correction, while in light blue after artifact correction), as well as RR intervals reported by a Polar chest strap (second row) and computed from reference ECG data (third row). We can also see the participant’s breathing pattern (about 10 oscillations per minute)

As previously discussed, rMSSD for artifacted data in this example was 229m. On the other hand, after artifact removal rMSSD for the camera-based algorithm is 166ms (hence very close to the 163ms of our reference, ECG). Again, differences in consecutive measurements, even using ECG, are in the 5–15ms range, hence our difference here is negligible and we were able to effectively remove all artifacts and estimate HRV correctly (you can find more information on repeated measures for PPG, chest strap and ECG data, here).

Group level validation

It is of course key to develop a method that works over a broad range of HRV values (and not only for the person shown in the figures above). Typical values we see for rMSSD in healthy individuals are between 10 and 250ms

Let’s look at the results of the method described above for about 100 recordings. In the figure, I also report the correlation and root mean square error between rMSSD computed from ECG and PPG. We want the correlation to be very high (close to 1) and rmse to be very low (realistically, below 10ms)

First, let’s look at the results without any artifact removal:

We can see quite clearly that for many recordings, artifacts are a big problem. We still have quite a few recordings on the identity line, hence this is probably very high-quality data, but rMSSD without artifact removal is highly overestimated in many cases, leading to poor correlation and large error

The three-steps method I described above is similar to the one we have previously published in Plews, D. J., Scott, B., Altini, M., Wood, M., Kilding, A. E., & Laursen, P. B. (2017). Comparison of heart-rate-variability recording with smartphone photoplethysmography, Polar H7 chest strap, and electrocardiography. International journal of sports physiology and performancelet’s look at the results when we apply it for artifact removal:

Much better. We now have an almost perfect correlation between ECG-derived rMSSD and camera-based rMSSD, after applying these simple filters. We can see how we do not have all those highly overestimated rMSSD values anymore

Any optical measure (phone-based, wristband, ring, etc.) — can benefit from this approach in order to provide high-quality HRV data

Signal quality estimation

Based on the artifact removal method just covered, we can also determine signal quality. A simple method I’ve developed to determine noise level is to rely on the ratio between the number of removed beats (according to the various filters) and the number of beats originally detected

Intuitively, if we remove zero or a few artifacts, we will have high quality data, while if we remove many artifacts, we will tend to have poor quality data. While it is possible that all artifacts are removed correctly even if there are many, in general this is rare. The reason is that detecting many artifacts in PPG data is typically associated with movement and therefore large disruptions in signal quality (more than actual ectopic beats), which cannot be easily recovered

Below are two screenshots taken during a measurement in which I intentionally moved the finger a lot while recording. You can see how the app is able to detect the issue and report back the problem to the user. While hardly any system out there reports quality (your smartwatch most likely will keep providing random heart rate data while you exercise, regardless of the fact that they might be able to detect the issue), I think this is a key feature that can help to gain confidence in the tools we use

We recommend storing only Optimal measurements, which means that no artifacts or very few artifacts were present during the measurement, and removed

In my view, it is quite pointless to pretend that a sensor will always provide high-quality data (no matter how much you pay for it), especially when it comes to optical sensors (watches, wristbands, etc.). Motion will always be an issue, and sometimes data might need to be discarded.

Implementing effective artifact removal methods, as well as being transparent about any potential issues, should make it easier to make effective use of these technologies, which can be extremely helpful in tracking individual responses to physical and psychological forms of stress (check out a few examples here)

That’s a wrap for this article, I hope you’ve found it useful

Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.

He has published more than 50 papers and patents at the intersection between physiology, health, technology and human performance.

He is the founder of HRV4Training and loves running.

Twitter: @altini_marco

--

--

Marco Altini
The Startup

Founder HRV4Training.com, Data Science @ouraring Lecturer @VUamsterdam. PhD in Machine Learning, 2x MSc: Sport Science, Computer Science Engineering. Runner