Exploring Heart Rate Variability using Python

Salomon Tetelepta
Orikami blog
Published in
11 min readAug 28, 2018

Since a few months our Friday afternoons got a new vibe. Here at Orikami, we started hosting ‘Open Space’ style knowledge sessions, where we dive into interesting topics and share our findings with each other.

One of those topics that got my interest was “Heart Rate Variability” or in short HRV. It is a popular biomarker that is used in many clinical trials and research for many years and is associated with a wide range of illnesses like diabetes, cardiovascular disease, obesity, chronic pain and stress-related diseases.

Today, I am particularly interested in how you can calculate HRV manually from a raw ECG signal. In this blogpost I will share my findings with you by going through a step-by-step derivation of HRV using python.

Note: in this post, I’ll highlight key aspects of the code. For the complete source, please check the Jupyter notebook on Github or kernel on Kaggle.

So what’s the plan?

1. Collect ECG data
Luckily we have some Bobbi sensors laying around, so I’ll stick some electrodes on my chest and put my heart to work to collect some raw ECG data.

2. Extract RR-intervals
Access to raw ECG data is very important for a lot of applications, for example if you want to diagnose Myocardial Infarction using LSTM’s. For today’s use case we’re most interested in the so called RR-intervals, so this part is about how we get from ECG data to those intervals using peak detection.

3. Calculate HRV
It appears that there are a wide range of different methods that are used to capture different aspects of the Heart Rate Variability. In this section I’ll explore Time Domain methods and throw some code at it.

1. Collect ECG data

For this experiment I want to go through every step myself and learn about the details you’ll encounter when acquiring (noisy) ECG data yourself using low-cost sensors.

Having said that, note that here are excellent ECG datasets available, like the MIT-BIH Arrhythmia Database which is used in thousands of scientific papers. This dataset contains medical grade recordings, which are very useful if you want to experiment with ECG data yourself.

The Bobbi sensor

Bobbi Sensor

The Bobbi sensor is developed by Totem Open Health as a prototyping tool for Health Care. Unfortunately the sensor is no longer available.

If you want to measure your own ECG data, you might use an alternative, like the low-cost AD8232 ECG Sensor Module Kit for Arduino. I’m not sure what quality that will give you, but.. I’ve ordered one myself, and as soon as I have some results I’ll give you an update.

My own ECG data

Using the Bobbi, I recorded 5 minutes of my own heart rate, while laying down in a park next to our office ☀️, and it worked! Let’s have a look:

5 seconds of ECG data

This seems like a clean ECG signal (and I surely hope it’s a healthy one too). The whole signal is quite clean, and I guess finding peaks won’t be much of a problem. This ECG sensor returns values as a arbitrary units, so these are not real voltages. We should take just interest in the relative values.

Let’s find out how we can find the peaks.

2. Extract RR-intervals

Before we dive into the code, first a bit of background.

The QRS complex

A normal ECG heart signal consists of a typical periodic pattern. Different points in a signal period of this pattern are labelled with the letters P, Q, R, S & T.
The most striking part in the signal is the QRS complex. This part corresponds to the depolarisation of the right and left ventricles of the human heart. The peak in the middle of the QRS complex is the R-value, and that point is what we need in order to calculate the HRV.

QRS complex

Fun fact: the choice for these PQRST labels apparently dates back to Descartes who suggested a mathematical convention to use the letters at the beginning of the alphabet to denote known quantities, and those at the end of the alphabet to denote unknown quantities.

RR-intervals

Our main goal here is to find a robust way to find the exact locations of the R-points in the ECG signal, also known as ‘R-peak detection’. Using these points we can get the subsequent distances between those points, the interbeat intervals (IBI) also known as RR-intervals. Sometimes these are called NN-intervals to explicitly denote normal beats.

To robustly find these intervals we will first explore a powerful technique for peak detection.

We’re interested in the distances between subsequent R-points of an ECG signal

Peak detection using template matching

There are a myriad of different QRS detectors. Most of those consists of two stages:

  • Stage 1 — Signal transformation
    Construct a signal that maximises the features of interest, in our case this is the QRS-complex.
  • Stage 2 — Decision rule
    Use a threshold to separate the desired features from the rest of the signal

A lot of peak detectors use a very neat trick for transforming the signal (stage 1) and I found it worth diving into that. It’s called template matching.

Template matching is widely used in pattern recognition, for example in particle analysis, cryptanalysis, computer vision and computational neuroscience. It’s an elegant and powerful technique that uses a filter (also known as template or kernel), that contains a specific feature and use that filter to find this feature in a larger signal.

Cross-correlation is a measure of the similarity between two signals (image taken from Giphy)

Basic idea is to slide the filter along a signal and compute the cross-correlation between the filter and the signal. If the filter matches a part of the signal closely, there is a strong correlation, which is another way of saying this part of the signal looks a lot like the feature in the filter. By setting a threshold (stage 2), you are able to find features you’re interested in.

Cross-correlation is often referred to as convolution, which is a slightly different operation where the filter is reversed f(t) -› f(-t) before sliding it along the signal. Convolutional Neural Networks should actually be named Cross-correlation Networks, but i guess that didn’t make it through the marketing department :-)

In images you could use this technique to find horizontal or vertical edges by designing specific filters. Convolutional Networks take this approach even further and learn filters that are relevant for certain classes automatically using back propagation.

Allright, so template matching seems great. Lets see how we can use it to find some RR-intervals!

Template matching in python

A filter to find patterns in ECG data, is nothing more than a list with numbers. For example, a list like [-1, 1] finds positive slopes in a peak, and [1, -1] will find negative slopes.

I’ve experimented with a number of different filters and lengths of filters and found that a using a part of a sine wave has very good results. Let’s see it in action:

# linear spaced vector between 0.5 pi and 1.5 pi 
t = np.linspace(0.5 * np.pi, 1.5 * np.pi, 15)
# use sine to approximate QRS feature
qrs_filter = np.sin(t)
# stage 1: compute cross correlation between ecg and qrs filter
similarity = np.correlate(ecg_signal, qrs_filter, mode="same")
# stage 2: find peaks using a threshold
peaks = ecg_signal[similarity > threshold].index
QRS filter: sine ranging from 0.5π - 1.5 π over 15 samples

In the image above you see part of the ECG signal (top) and the cross-correlation between the signal and the sinewave filter (bottom).

What’s interesting, is that there are some rather suppressed R-peaks that still have a large similarity. If we would just use thresholding on the original signal, we’d definitely miss those peaks. Template matching amplifies the peaks, so it separates the features from the rest. In stage 2 we can pick these up with a threshold.

Now, a sine wave is a very simplified model for a QRS segment. Ideally, we come up with a filter that generalises well over all sorts of variants we might encounter in the world.

This makes me wonder if we could find such a “mother of all (normal) QRS segments” using labeled ECG data automatically. As I mentioned earlier, this should be possible using Convolutional Networks. I’m eager to try that too, but I’ll save that for another blogpost.

After all, in my humble dataset this simplified QRS filter works quite well, so I’ll just use it to extract the RR-intervals.

# detect peaks 
peaks, similarity = detect_peaks(df.heartrate, threshold=0.3, qrs_filter=qrs_filter)
# group peaks so we get a single peak per beat
grouped_peaks = group_peaks(peaks)
# RR-intervals are the differences between successive peaks
rr = np.diff(grouped_peaks)
RR-intervals based on the peaks

From this plot, we can see that the average RR-intervals is around 850 ms, which is about 70 BPM. We also see a couple of outliers, so I guess the peak detection is not flawless after all. When the detection algorithm misses a peak, some intervals are very large (around twice the mean). Some peaks are found very near two each other, and in that case the RR-intervals are very short.

Error correction

If there is only a small amount of errors, we can correct them. There are different ways to correct the errors, and as a first attempt I replaced the outliers with the median value of the RR-intervals.

Distribution of RR-intervals

For this, I use the zscore, which is a metric for the distance between a value and the mean of a distribution, measured in standard deviations.

By selecting RR-intervals with an absolute zscore larger than 2, we find the outliers and we can correct these by setting it to the median value.

from scipy.stats import zscore
rr_corrected[np.abs(zscore(rr)) > 2] = np.median(rr)
RR-intervals with and without correction

Downside of this approach is that the timings are no longer precise. For the final HRV analysis, this doesn’t have to be a big issue, but because I want to plot the RR-intervals on top of the ECG data, the alignment needs to be exact. Therefore, I’ll manually correct the errors.

Manually correction of the errors to get maximum accuracy

Export RR-intervals

Now that we can confirm that all RR-intervals align nicely with the ECG data, I’ll export the intervals to a standard RR-interval file. This is just a plain text file, with an RR-interval in milliseconds (as integer) on each line.

3. Calculate HRV

I will explore time domain methods, calculate them and compare the results with Kubios HRV, a popular HRV analysis software package.

Time domain analysis

Time domain methods use the RR-intervals and measure a whole range of metrics, that have something to say about the variability. These metrics were standardised in a special report of the Task Force of ESC/NASPE in 1996.

Differences between successive RR-intervals

RMSSD
The most popular HRV metric is the Root Mean Square of Successive Differences or RMSSD.

It’s a measure for how much variation there exists in the heart rate. In a healthy heart, there is a natural variation, which is due to a balance between the sympathetic nervous system (SNS) and parasympathetic parts (PSNS) of the Autonomous Nervous System.

If your body experiences stress, then the sympathetic system will activate, to prepare for fight or flight behaviour, and your heartrate will increase.

The parasympathetic controls your body’s “rest and digest” responses and is associated with recovery. Parasympathetic activation conserves energy, constricts pupils, aids digestion, and slows your heart rate.

These two parts of the nervous system are normally in a healthy balance, causing a natural variation in heart. If this balance is disturbed for any reason, this variance will change. A lower RMSSD is associated with stress and various illnesses.

Other metrics

RMSSD is often used as the score that represents your “HRV”. It’s the most important one and it’s used in a lot of research.

Here’s a list of other metrics, that are used for time domain analysis:

  • Mean RR: mean of RR-interval
  • SDNN: standard deviation of the RR-intervals
  • Mean HR: the well-known mean heartrate, measured in Beats Per Minute
  • STD HR: standard deviation of the heartrate
  • Min HR: lowest heartrate
  • Max HR: highest heartrate
  • NN50: The number of pairs of successive RR-intervals that differ by more than 50 ms. (normal RR-intervals are often called NN-intervals)
  • PNN50: The proportion of NN50 divided by the total number of RR-intervals.

Calculate and compare with Kubios

# RMSSD: take the square root of the mean square of the differences
rmssd = np.sqrt(np.mean(np.square(np.diff(rr))))
# Mean RR
mean_rr = np.mean(rr)
# SDNN
sdnn = np.std(rr)
# Mean HR
mean_hr = 60 * 1000/np.mean(rr)
# STD HR
std_hr = np.std(hr)
# Min HR
min_hr = np.min(hr)
# Max HR
max_hr = np.max(hr)
# NNxx: sum absolute differences that are larger than 50ms
nnxx = np.sum(np.abs(np.diff(rr)) > 50)*1
# pNNx: fraction of nnxx of all rr-intervals
pnnx = 100 * nnxx / len(rr)

Lets compare our results with the report from Kubios:

Distribution of Heartrate (reported by Kubios)

Most metrics are similar, which is as hoped. However, the reported value for standard deviation of the heart rate (STD HR) reported by Kubios (version 3.1.0.1) seems too low. When I look at the distribution of heart rates as reported by Kubios, a standard deviation of around 5 seems much more plausible. Not sure what’s going on there

Let’s call it a day!

Today, I focussed on deriving HRV from a raw ECG signal. I’ve learned a great deal:

  • Diving into template matching was great fun. Previous learnings when studying convolutional networks just got new meaning, and I learned how to create my own personal peak detection algorithm, yay!
  • I just love plotting with matplotlib and seaborn. You always learn a great deal when you try to create meaningful plots.
  • I learned that there are still many things left to explore, like advanced methods to analyse the frequency domain, geometric and non-linear features. And perhaps, most importantly, what are we able to do with all these numbers? What can we say about the physiological state if we measure HRV? How can we use HRV in practice as an indicator for stress or illnesses? These are all topics that I’ll save for future exploration.

If you like this post, please follow me and my colleagues on this Orikami publication.

If you want to know about what we do or if you would be interested in what data can do for your company, feel free to send me an e-mail at salomon@orikami.nl or call +31 24 3010100.

--

--