Blood Pressure Inside-Out

Estimation of blood pressure from continuous waveform data

[image source: left right]

During my time at the Insight Health Data Science program in Boston, MA, I worked on a consulting project for a startup company that has developed an external continuous blood pressure sensor. They asked me not to disclose their name at this stage, so I’ll just refer to them as my client company. They have a huge amount of arterial line data (~70 participants, each of whom has hours of pressure data collected 125 times / second — so, hundreds of millions of rows), and an interesting problem: the raw data that comes in from their arterial line (ATL) transducers are unscaled, meaning that its relationship to the true blood pressure varies from person to person. If we could predict this relationship from the shape of the waveform, then a similar model could be used to automatically obtain a baseline for the external sensor.

What’s the utility here?

As with any real business problem, it’s worth asking why we’re trying to do what we’re trying to do. In this case, it’s pretty clear — not just for the company, but also for the state of the art in medical practice.

Currently, the state of the art for medical continuous blood pressure monitoring is an arterial line — an invasive procedure. This means that only people like cardiac surgery patients receive continuous monitoring (and the last thing they need is another hole in their circulatory system). [image source]

An arterial line is a catheter that’s actually inserted into an artery, which is then connected to a transducer to measure blood pressure in real time. This is an invasive process, opening up the patient to infection, and is prone to problems like disconnection, kinks in the tubing, etc., requiring skilled technicians to utilize effectively. The other ways that we’re all familiar with having our blood pressure taken involve cutting off the flow of blood and then slowly allowing it to return — something that obviously won’t work continuously. Thus, if we could monitor blood pressure accurately, continuously, and non-invasively, we would have a lot better information on how to predict and prevent cardiovascular disease (the #1 killer in the US, linked to 1 in 4 deaths according to the CDC).

Monitoring blood pressure with something like a wristwatch may not be far off, thanks to companies like my client. [image source]

Focusing in from the global issue to this particular problem, the company expect (reasonably) that the signals produced by their external device will follow a similar pattern to the raw values produced by an arterial line. In an arterial line, these raw values are then transformed into a blood pressure scale (millimeters of mercury, or mm Hg) through a process that involves a human technician checking the values against a blood pressure cuff and adjusting the baseline until it is correct. This same baseline is then used for a number of hours before the technician checks again (for drift). If my client can replace this (expensive, fallible) human intervention with a machine learning model to determine baseline BP, not only will their own product expand its use case far beyond the surgical theater, it also stands to disrupt the gold standard in medical care for continuous BP monitoring.

Is this actually possible?

So wait, we want to predict someone’s blood pressure — a vital statistic where the actual value is highly relevant to medical condition and general health — from just the shape of the heartbeat at an arbitrary scale? Well, it turns out there’s a lot of evidence this is possible. For example, Kachuee et al. (2015) have used the shape of electrocardiogram (ECG) waves to estimate blood pressure with enough accuracy to pass the British Hypertension Society protocol for blood pressure monitors.

Whether we’re talking about external sensor or arterial line data, the shape of a heartbeat changes depending on where it’s measured relative to the heart. This means that the shape actually has some relationship to the blood pressure, and that machine learning techniques can be applied to extract one from the other.

Arterial line data

My client company provided a set of raw Arterial Line (ATL) data from 72 subjects, about 80% of which had been labeled with the true BP values. The company had also made a first pass at identifying the points along the “true” line that represented the systolic and diastolic blood pressure at each heartbeat. The total dataset represented over 600 hours of data recorded at 125Hz: over 420 million rows. Given that normal human heartbeats happen on the order of once every 0.3–2 seconds (that is, in the range of 30 to 180 beats per minute), this represents over 3 million individual heartbeats.

Data cleaning

Given that these data come from recordings of biological processes in humans, it’s unsurprising that they contain a lot of noise.

Segmentation of the heartbeat allows us to only keep “good” beats for feature input into the model. Here we can see the raw values in blue, from which we need to predict the true values in orange. Each vertical line represents a heartbeat segmentation; the green ones precede clean beats, and the red precede noisy beats (in this case, hitting the arbitrary upper threshold of the device).

The raw values that we need to use to predict the targets (pictured here in blue) have an arbitrary scale from 0–255. However, about 5% of the time (depending on the file), the values hit this threshold, meaning that we don’t know what the training value ought to be. (As an aside, this provides an opportunity for more accurate modeling in the future, by fitting a polynomial curve to the data we do have and estimating the true heights and depths of these raw values. However, that’s outside the scope of this current project.) Similarly, there are periods when the line got kinked or disconnected, or heartbeat frequencies that were too fast or slow to be real human heartbeats (likely due to movement of the recording device or some other artifact).

Feature extraction

One major problem when attempting to predict blood pressure from heartbeat waveform data is figuring out what features will be relevant to the model. This has been approached in a number of ways. For example, the area under the curve, plus the width and height of a single beat can capture a great deal of information. Domain knowledge might similarly lead us to identify parts of the wave associated with the movement of specific parts of the heart (e.g., time distance from the systolic peak to the dicrotic notch, pressure difference between the systolic peak and the minimum point of the wave, area between the point of maximum slope and the systolic peak, and so forth).

However, in this particular instance I wanted to use all the available information about the shape of the wave. As such, I decided to approach this like an image recognition problem. The image of a heartbeat, cast into an identical grid for each beat, should capture all the information about the shape, and (assuming there are systematic differences that relate shape to true pressure) allow us to best predict the diastolic and systolic blood pressure for an unknown wave by associating it with the closest matching waveform. Especially with more than 2 million such waves, the model should be able to learn whatever associations are present in the data.

In order to predict the systolic and diastolic blood pressure for each heartbeat (shown here in orange) from the raw values (in blue), we can segment the numalized, interpolated values such that each value becomes a feature in the model.

Since the true pressure values are unknown, in this problem, I was able to do a simple normalization of the Y axis, subtracting the minimum and dividing by the maximum, to cast all the raw values into the same 0–1 scale. Similarly, since we need identical numbers of features for each heartbeat, I took the maximum length of a heartbeat (around 200 ticks) and cast any shorter beats into that feature space by interpolating the X values (as though we had resampled the wave at a higher frequency). This put each heartbeat into an equivalent X and Y space for the model.

Model selection

Since we’re attempting to predict continuous values from continuous values, a regression approach is the most appropriate. The simplest model would be a linear regression (and more about that later), but these data are not at all independent — they were produced by the same heart beating over and over. Thus, it violates a fundamental assumption of the linear regression. This is also an assumption of support vector machines, making it inappropriate for a SVR solution. Two models that don’t make the assumption of independence are K-Nearest Neighbors (KNN) and Random Forest (RF) regressions. KNN simply assumes that the predicted value can be approximated by some local function (that is, that the predicted value Y_new value of an X_new heartbeat is related to the Y_known values of the nearest X_known heartbeat features). RF uses binary trees to minimize some error value of the function (often the Mean Squared Error, but in our case the Mean Absolute Error can be used, since we are actually trying to meet a MAE guideline published by the FDA for blood pressure monitoring devices).

Model validation

Like many supervised learning processes, these models can be evaluated using cross validation, and then tested on a set of data that the model has never encountered during training. I used tenfold cross validation (since we theoretically have plenty of training data), and set aside two participants (out of 72) to be used as a test set.

Random forest regression does a reasonably good job predicting during cross validation, but an extremely poor job on the test set. Initial suspicion: overfitting.

As it turns out, the KNN and RF models performed quite similarly (at least, with a large enough number of neighbors for the former and depth of tree for the latter. If using (e.g.) 30 neighbors in the KNN distribution and a tree depth of 30, however, the RF runs significantly faster (a factor when dealing with millions of points of training data). Thus, my final model was a multi-target (for the systolic and diastolic BP) random forest with a max depth of 30. Here are the results of one run of 10x cross validation, and then on the test set of two unknown subjects’ data.

Note that model performance differs greatly depending on what it’s being tested on (and in particular, performs poorly on the two as-yet-unseen subjects). So what is the model learning here? Are we just overfitting?

An unusual dataset

When taking any given subject’s data on its own, we can almost perfectly predict any new data from that participant with a linear regression using just a single slope and intercept.

Remember how I mentioned that we’d get back to linear regressions? Well it turns out that if you take any particular subject in this dataset and only use data from that subject, the data are perfectly predictable with just a single slope and intercept from a linear regression. But — even more interesting — most of the people we have available share a slope+intercept relationship with at least one other person. In fact, about a third of the people (26 out of 72) have exactly the same simple, linear relationship between the raw data and the true BP. Using only data from these people, the random forest also performs nearly perfectly (as one might expect). But once you add in other subjects with different (again, perfectly linear) relationships between the raw and true BP, the model performs increasingly poorly. The two subjects I happened to choose for the test set have unique slope+intercept pairs, making them particularly difficult for this model to predict.

Implications and next steps

So what does this mean? Essentially, the model is learning one set of relationships, and then being tested on a different set. This may have huge implications for the state of the art in continuous blood pressure monitoring. If there is a human component to setting the baseline for this measurement, and that is actually determining the linear relationship between the raw data and the “true” blood pressure, then the current state of the art may not be capturing the truth well at all (or at least: not consistently well). This supports the client company’s idea that the market and use case for this diagnostic monitoring tool may be ripe for disruption. If they could replace this human intervention in the baseline measurement with a consistent and accurate machine learning approach, their external device would certainly become the gold standard for continuous blood pressure monitoring.

[image source]

The next steps toward this goal are clear. First, since their new device will certainly be tested against the current gold standard during clinical trials, they should be ready to take all the information they can get from a human onsite setting up the device. That is, if a technician uses a blood pressure cuff to set the baseline for the current arterial line, the external sensor should be prepared to have its baseline set this way also. With such a “grounded” model, only two points of “truth” would be needed to set the linear relationship between the raw and scaled data — and two points is exactly how many a cuff reports (diastolic and systolic pressure).

Finally, if the “ground truth” of continuous BP monitoring is in fact contaminated by human intervention at baseline, the client company may have to collect more data for their attempts to automatically calculate a baseline for their new device. They will need to show that their device can match BP values taken by a cuff or some more reliable measure over a longer period (since the cuff measures by cutting off blood pressure, and can only be used once every few minutes maximum). The good news is that if they’re correct and the raw data from the external sensor has the same relationship to the truth as the arterial line data, then the final model may be a very simple one (e.g., a linear model with very few coefficients).

Further resources

Over the course of my time at Insight, I’ve had to speak about this project many times. If you’re interested, here’s a link to my presentation slides. If you’d like to see the code I used in exploring these datasets and developing these models, it can be found on github. Note that the actual data I used is proprietary, but if you’d like to find some arterial line data for a similar project, Kauchee and colleagues at UC Irvine have made a database of 12,000 patients’ ATL and ECG data publicly available. These have already been transformed from raw values to mm Hg, but they can be used to show the relationship between waveform shape and blood pressure values.