Principal Components of Electrocardiograms

Andrew Tan
5 min readNov 29, 2016

--

An electrocardiogram or ECG is a signal representation of the heart. It is a recording of the heart’s electrical activity providing useful information about its overall functioning behaviour. Medical devices called Holter monitors are placed on patients to provide a continual ECG. They can be used to diagnose various cardiac diseases and some are able to sense the failure of a heart and provide an electrical jolt to defibrillate a dying patient.

A patient’s ECG can be examined by both doctors and ECG technicians. By noting the speed and variability of the heart rate as well as the individual waveform morphology of the heart beat, they can make a diagnosis.

Typical waveform of a heartbeat from an ECG. The vertical axis represents voltage and the horizontal axis represents time.

A typical waveform can be broken down into several different waves and segments each of which represent the repolarization/depolarization of the cardiac muscles in the heart as it is pumping blood through its chambers.

P-wave: The depolarization of the atria, the upper chambers of the heart. Blood now begins to flow from the atria into the ventricles, the lower chambers of the heart.

PR Segment: Represents the time delay to allow the ventricles to fill.

QRS Complex: The main contraction of the heart where the ventricles depolarize.

ST Segment: The time delay between depolarization and repolarization of the ventricles.

T-wave: Repolarization of the ventricles.

Abnormalities in the function of the heart can be found by closely examining the appearance of these features.

Waveform from an actual ECG recording.

However there are problems in practice with gathering sufficient clean ECG recordings to properly view these features. Noise filtering is a must in any ECG setup as a patient’s breathing, muscle movement, perspiration and nearby transmission lines all contribute to noise in the signal. The above figure shows an actual ECG recording, albeit cleaner than most signals there are still some small artifacts present.

In the next section I’ll outline a popular machine learning algorithm called Principal Component Analysis and furthermore use it to extract a cleaner signal.

Principal Component Analysis

Principal Component Analysis or PCA is a linear transformation technique. Typically used for dimensionality reduction and compression it has applications in data analysis, finance and bioinformatics.

PCA identifies patterns in data using the possible correlation between features. It finds the vector with the maximum variance in a high-dimensional dataset and with it forms a group of orthogonal vectors each with the next highest variance. This new subspace has equal or fewer dimensions than the original. The algorithm allows the user to choose the number of vectors k to use. We can think of the first vector in the subspace as the most “principal” component and the following vectors as the next most “principal.”

The PCA algorithm can be summarized in the following steps:

  1. Construct the covariance matrix of your n-dimensional dataset X.
  2. Find the eigen-decomposition of the covariance matrix.
  3. Select k eigenvectors that correspond to the k largest eigenvalues, k will now be the new dimensionality of the transformed dataset (kn).
  4. Construct a projection matrix W from the top k eigenvectors.
  5. Transform the n-dimensional input dataset X using the projection matrix W to obtain the new transformation.
Various PCA transformations using k = 1, 20 and 100.

In the example above, I generated a dataset of PQRST waveforms from a 30-minute ECG record and ran the PCA algorithm with 1, 20 and 100 components. The example shows the original signal alongside several augmented signals where the original was reduced to a smaller number of components and then re-projected back onto the original signal space. A few key points:

  1. Each of these signals represents the summation of the vectors or directions of maximum variance in which a new subspace is formed. In the special case of k=1, the signal is the vector which has the largest variance with respect to the dataset. Visually we can easily confirm this by comparing it with the original signal.
  2. In the k=100 case where in the original dataset n=200, the signal is very much preserved, almost identical even with half the dimensions. This nicely illustrates PCA’s application in compressing information.
  3. We can use this technique to extract the main waveforms of our ECG and remove the “fuzziness” plaguing the baseline of the signal. The morphologies of the waves and segments in the signal are preserved thus still allowing a technician or doctor to properly assess the patient’s condition.

De-noising using PCA

I used Python to collect, preprocess and de-noise the signal. For numerical preprocessing I used the NumPy and SciPy packages, Sci-kit Learn for the PCA algorithm, and for the scraping and processing of ECG signals I used the WFDB-python and BioSPPy packages. Plotting was done using the Matplotlib and Seaborn packages.

To begin, I pulled a 1-minute ECG from the MIT-BIH Arrhythmia Database along with the locations of each heartbeat.

The dataset is created by taking the interval space between each QRS complex (called the RR interval) and resampling it to a common size which I chose as the overall average.

Next I applied the PCA algorithm. After transforming the data to remove non-principal components and inverse-transforming the data we are left with an approximated signal. The final processing step is to resample the intervals once again to their original size and restitch each interval together.

The de-noised signal certainly looks much better than the original and retains its temporal resolution. There are some caveats to my method that I must address. First, the data I used is already relatively clean. There is not much noise in the lower to mid frequency range. Second, something which I haven’t yet addressed is that this patient is in a Normal Sinus Rhythm for the entire record. This means that each of their heart beats follows a similar morphology. This usually isn’t the case with most people, especially those who may actually need to have an ECG test.

For the PCA algorithm it means that each entry in the dataset was strongly correlated, thus allowing a single principal component to provide a very nice representation and was hence why I opted to choose this record. The general method I outlined still can be applied to more noisy signals or signals where other beat morphologies may be present but several additional steps of preprocessing and filtering must be done.

--

--

Andrew Tan

Biomedical Engineering, Machine Learning, Finance, M.A.Sc Engineering Physics.