Artifact removal is probably the most important and (unfortunately) most overlooked step of the signal processing pipeline required to compute HRV features
While all beat to beat data should go through artifact removal (even when collected with ECG or chest straps, as ectopic beats would still be present under these circumstances, see an example here), the issue becomes particularly important for PPG measurements, as they are more prone to noise (which means that it’s easier to mess up the signal, just by moving)
The issue of artifact removal is particularly important for HRV analysis. In particular, even a single artifact over a 5 minutes window, can have very large consequences in terms of the derived HRV features (we’ll see an example in a minute). Camera-based apps, watches, rings, wristbands, are all affected by these issues as soon as you move
Thus, if our goal is to correctly compute HRV features in healthy individuals (i.e. individuals without cardiac issues), we need to do our best to clean the beat to beat intervals from any artifacts, regardless of their origin (actual ectopic beats, or issues because the user was moving)
How bad is the problem?
This is why 99% of smartwatches on the market do not even bother with HRV analysis and target only heart rate estimation, which still doesn’t really work consistently when exercising (if you are serious about your exercise heart rate, please get a chest strap)
The few devices that do go through the trouble of doing HRV analysis, normally do so while you sleep (e.g. an Oura ring or Fitbit), or using a specific sensing modality (e.g. the Scosche Rhythm24 HRV mode or the Apple Watch using the Breathe app). This choice makes a lot of sense as if you are sleeping or doing a breathing exercise, you are not moving that much.
Additionally, given the limited utility of HRV analysis during exercise, as long as you are able to collect high-quality data at rest or during the night, you are good to go (you can learn more about heart rate variability and when to measure, in our guide here).
For phone or camera-based measurements, similar to the ones we use in HRV4Training or HRV4Biofeedback, issues are typically caused by finger movement, as the apps are used while at rest, and therefore there is no body movement
Let’s look at one example:
Above we have one minute of PPG data, including detected peaks. In general, the data shown here is good quality, however, there are some clear artifacts (e.g. in the second row, causing a spike and abnormal gap between beats)
During this test ECG data was collected simultaneously, used to extract reference RR intervals, and compute rMSSD, which was 163ms. If we use the PPG data and detected peaks we have here to compute rMSSD, we get 229ms (which is a large difference for this metric, repeated measures are in the 5–15ms difference range).
The few artifacts present have a large effect on our output metric, and therefore we need to address the issue or the data collected will be rather useless. Note that this problem normally does not affect resting heart rate (60 beats over a minute are still 60 beats even if a couple of them are out of place, hence this is key only in the context of HRV analysis normally)
Basic artifact removal in three steps
There are many different methods that can be used to remove artifacts. Something that I found to be effective when looking at data over a broad range of HRV values and PPG-related issues, is the following:
- Remove extreme values (range filter, typically anything that does not result in an instantaneous heart rate between 20 and 200+ bpm, depending on the application, e.g. resting physiology or exercise)
- Remove beat to beat abnormalities. This means removing beat to beat differences that for example are more than X%, which is not physiologically possible. X should change based on the actual baseline HRV of the person, as the common thresholds (20–25%) can overcorrect. Overcorrection tends to be a minor problem for nonathletes but should be accounted for in a population with particularly high HRV values
- Remove remaining outliers. After the previous steps, we could still have some outliers, especially if we are less strict with the abnormalities filter (say we use 50–70% for athletes, then there will be more artifact that we actually need to remove). For this filter, I found (empirically) the following thresholds to work well: 0.10–0.25 * 25th and 75th percentiles of the clean data.
In our apps, we use these methods plus a few extra steps that can be feature-dependent, or person-dependent, as well as optimized thresholds based on the person’s historical data and group-level parameters. However, in almost all cases, what is reported above is already sufficient, as we will see in the validation below
Let’s first look at our example, we can see here in yellow the valid peaks after artifact removal:
Lets now look at the PP (and RR) intervals. PP intervals are the beat to beat differences computed after detecting individual beats in our PPG (or ECG, called RR intervals in this case) data. When we visualize PP intervals over time, normally we can spot easily any artifacts (spikes) as well as any other issues, since the time series should look very similar between sensing modalities (phone camera, chest strap, or ECG).
In the figure below, we have in the top plot our camera-based PP intervals (in dark blue before artifact correction, while in light blue after artifact correction), as well as RR intervals reported by a Polar chest strap (second row) and computed from reference ECG data (third row). We can also see the participant’s breathing pattern (about 10 oscillations per minute)
As previously discussed, rMSSD for artifacted data in this example was 229m. On the other hand, after artifact removal rMSSD for the camera-based algorithm is 166ms (hence very close to the 163ms of our reference, ECG). Again, differences in consecutive measurements, even using ECG, are in the 5–15ms range, hence our difference here is negligible and we were able to effectively remove all artifacts and estimate HRV correctly (you can find more information on repeated measures for PPG, chest strap and ECG data, here).
Group level validation
It is of course key to develop a method that works over a broad range of HRV values (and not only for the person shown in the figures above). Typical values we see for rMSSD in healthy individuals are between 10 and 250ms
Let’s look at the results of the method described above for about 100 recordings. In the figure, I also report the correlation and root mean square error between rMSSD computed from ECG and PPG. We want the correlation to be very high (close to 1) and rmse to be very low (realistically, below 10ms)
First, let’s look at the results without any artifact removal:
We can see quite clearly that for many recordings, artifacts are a big problem. We still have quite a few recordings on the identity line, hence this is probably very high-quality data, but rMSSD without artifact removal is highly overestimated in many cases, leading to poor correlation and large error
The three-steps method I described above is similar to the one we have previously published in Plews, D. J., Scott, B., Altini, M., Wood, M., Kilding, A. E., & Laursen, P. B. (2017). Comparison of heart-rate-variability recording with smartphone photoplethysmography, Polar H7 chest strap, and electrocardiography. International journal of sports physiology and performance — let’s look at the results when we apply it for artifact removal:
Much better. We now have an almost perfect correlation between ECG-derived rMSSD and camera-based rMSSD, after applying these simple filters. We can see how we do not have all those highly overestimated rMSSD values anymore
Any optical measure (phone-based, wristband, ring, etc.) — can benefit from this approach in order to provide high-quality HRV data
Signal quality estimation
Based on the artifact removal method just covered, we can also determine signal quality. A simple method I’ve developed to determine noise level is to rely on the ratio between the number of removed beats (according to the various filters) and the number of beats originally detected
Intuitively, if we remove zero or a few artifacts, we will have high quality data, while if we remove many artifacts, we will tend to have poor quality data. While it is possible that all artifacts are removed correctly even if there are many, in general this is rare. The reason is that detecting many artifacts in PPG data is typically associated with movement and therefore large disruptions in signal quality (more than actual ectopic beats), which cannot be easily recovered
Below are two screenshots taken during a measurement in which I intentionally moved the finger a lot while recording. You can see how the app is able to detect the issue and report back the problem to the user. While hardly any system out there reports quality (your smartwatch most likely will keep providing random heart rate data while you exercise, regardless of the fact that they might be able to detect the issue), I think this is a key feature that can help to gain confidence in the tools we use
In my view, it is quite pointless to pretend that a sensor will always provide high-quality data (no matter how much you pay for it), especially when it comes to optical sensors (watches, wristbands, etc.). Motion will always be an issue, and sometimes data might need to be discarded.
Implementing effective artifact removal methods, as well as being transparent about any potential issues, should make it easier to make effective use of these technologies, which can be extremely helpful in tracking individual responses to physical and psychological forms of stress (check out a few examples here)
That’s a wrap for this article, I hope you’ve found it useful
Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.
He has published more than 50 papers and patents at the intersection between physiology, health, technology and human performance.