Neural networks and feature-based machine learning:
Nanowear, a New York-based connected care and companion diagnostic platform built on FDA-cleared nanotechnology, is delivering a new standard for remote patient monitoring. Our cloth-based nanosensors capture and transmit 15+ medical grade bio-markers, enabling our machine-learning algorithms to alert care providers of worsening patient status. As we began our approach to the development of these algorithms, we were faced with two challenges: First, doing trials with hundreds of thousands of patients is expensive and very time consuming. Second, we needed to develop a solution that could get FDA clearance, which means that it had to be easy to understand and verify. In this article, we’ll show how we met those challenges.
Perhaps all of machine learning (ML) seems like flashy new technology, but it actually represents a spectrum of approaches spanning orders of magnitude in complexity. On the simpler side, we have approaches like linear regression over a few engineered inputs that could contain just a handful of learned parameters. On the other side, approaches like Neural Networks (NNs) are able to learn how to construct their own features, and could leverage millions of parameters. Both are powerful tools meant for different applications of data and classes of datasets. In deciding which approach is right for your product, there are several questions to ask yourself including: Is your dataset rich enough for a model to learn its own features? Or can you forgo explainability for the model’s output? If you answered no to either, then using a simpler model (and thus explainable) on engineered features is probably the way to go. Also it’s always okay to start with something simpler first, or mix the two.
In healthcare, NNs are often used for discovering insights from large pre-existing datasets from hospital systems, such as EMRs or imagery. Simpler algorithms based on pre-engineered features are often used in concert with medical devices, as a way to leverage decades of existing medical research, and in turn require less training data.
Let’s examine several successful applications of NN in healthcare. A team at Mayo Clinic trained convolutional NNs to predict a patient’s sex and age based on ECG and to detect abnormal heart rhythm (atrial fibrillation) in patients. A team at Verily have used deep learning to predict cardiac risk from scans of the retina.
Each of these impressive efforts required an enormous initial data set from which to glean their insights, with patient populations in the hundreds of thousands, and in one case involving patient records that spanned two decades.
- Mayo Clinic — 774,783 patients
Age and sex determination from ECG samples taken over 20 years
- Mayo Clinic — 182,922 patients
Atrial fibrillation detection
- Verily — 297,360 patients
Cardiac event risk prediction
An approach based on engineered features is most effective when the data being captured is known to be directly correlated to the output we intend to measure. In that scenario, it can often arrive at a higher-quality outcome with significantly smaller data sets and retains it’s interpretability.
Teams at CardioMEMS (now part of Abbott) and Boston Scientific were able to demonstrate safety and efficacy and achieve FDA approval for their products using these engineered-feature approaches with a much smaller population of patients.
- CardioMEMS (Abbott) — 500 patients
Implantable pressure monitor
- HeartLogic (Boston Scientific) — 975 patients
Worsening heart failure predictor
Adaptive AI algorithms and FDA clearance
Digitized decision making (such as machine learning) offers the unique ability to effectively update the diagnostic rules quickly and easily. Consider the time and money cost to retrain professionals every time a new standard is introduced, and then compare that to “simply” updating the logic running on a medical device. Of course, its not that simple because FDA approval, requisitely focused on safety and efficacy, is still evolving for these adaptive AI algorithms — those that evolve from real-world use In the medical device world, another important factor in the decision to take a feature-based or neural network approach to your data is the necessity of FDA clearance. The FDA has been working on a framework for clearance of adaptive AI algorithms — those that evolve from real-world usage and don’t require manual intervention to incorporate updates — and while the agency has taken a progressive and proactive stance towards adaptive AI, their final guidelines remain a work in progress. Understandably so, as healthcare and medtech do not currently have standards to address a ‘machine’ being wrong in diagnostic or therapeutic intervention.
The FDA’s traditional methodology can be applied in a straightforward fashion with feature-based algorithms for diagnostics or therapeutic intervention. The steps to prove efficacy involve individually validating the features used in the algorithm and validating the feature-based algorithm as a whole, as a sum of its parts. Since all the pieces of the system are interpretable through experiments, there will be no parts of the system’s operation that will be unexplainable.
The artificial intelligence technologies that have been cleared by the FDA to this point have been non-evolving or “locked” algorithms that don’t adapt or change during the course of their application. As manufacturers continue development and the locked algorithms are further modified and trained, they are subject to manual validation before being cleared for redeployment.
The FDA is currently exploring a process that allows for modifications to algorithms from real-world learning and adaptation while still ensuring their high standards of safety and efficacy. Their approach will likely involve working with manufacturers to measure algorithmic performance, document plans for modifications, and ensure a manufacturer’s ability to manage and control the quality and risks of the modifications. It may also include a review of what the FDA calls a predetermined change control plan — detailed information about anticipated evolution of the algorithm from a re-training and updates, and the methodology that ensures that those updates happen in a controlled fashion that manages patient risk.
FDA is currently in dialog collaboratively with Medtech OEMs, stakeholders and experts, through trade groups such as AdvaMed Center for Digital Health and is encouraging feedback as their approach to incorporating Machine Learning continues to evolve with industry AND patient / health provider needs.
Currently, in most hospitals, doctors send heart failure patients home with no good tools for monitoring, and thus do not get data to support decisions. Nanowear addresses this problem.
Nanowear’s key invention is a proprietary cloth-based Nanosensor technology garment that can capture and digitally transmit multiple physiological signals non-invasively. The garment can acquire high accuracy data in terms of measurement and time precision over long periods of time while the patient is at home, enabling closed loop machine learning applications for disease management and / or companion diagnostics. This garment is a unique data source or firehose from basic skin contact that has not been available thus far in the management of care for patients with chronic heart failure.
We launched a clinical trial involving several hundred patients diagnosed with heart failure. The goal of this clinical study is to tune and validate an accurate predictive model that can use all the data from the Nanowear device and the patient data at multiple study centers to generate a score that will predict the likelihood that a patient is readmitted to the hospital with worsening heart failure.
Our first algorithm is an explainable ML algorithm that leverages the domain knowledge of clinicians and researchers who have quantified the various physiological metrics that are correlated to worsening heart failure.
Worsening heart failure is the result of a combination of several factors that have a variable impact across the patient population. In scientific literature, researchers have generated models for the prediction of impending hospital admissions in patients with congestive heart failure. They have used clinical measures of patient health like blood pressure, Body Mass Index (BMI), Brain Natriuretic Peptide (BNP) blood levels, heart rate and rhythm disturbances, blood glucose levels, current smoking status, Diabetes, Chronic Obstructive Pulmonary disease (COPD), blood creatinine level, and number of heart failure-related admissions in the last year.
The researchers assigned an integer score for each of the measures and developed a model as a weighted sum of these scores resulting in a predictive number that can be used to stratify the likelihood or risk of hospital readmissions. This score can then be used to assist a physician’s decision on whether the patient needs any alteration in therapy.
These are approaches where there is a relatively small number of data points used to make a prediction. Although there is a possibility of bias in the model due to the choice of weakly correlated predictors which limits the specificity of the predictor, the likelihood of overfitting of data is minimal. From an ML perspective, the model must maximize specificity and minimize false negatives to ensure that it enhances the productivity of healthcare personnel.
A case for the scientific method: Feature selection and bias
The notion of machine learning models being black boxes or unexplainable is not readily acceptable to clinicians and researchers. The ability to explain the working of medical technology products to clinicians as well as the research community requires a ground truth of how the device works.
A conservative and prudent approach would be to try to understand the existing consensus among researchers and physicians on what observations from clinical data serve as statistically significant predictors of worsening disease state. All fundamental domain knowledge obtained through hypothetical correlations corroborated by empirical data acquired through a clinical trial or a bench scientific study must be incorporated in the preliminary design of an ML model.
In our case, elevated heart rate and/or respiration rate, the shallowness of breaths, trends in the variation of thoracic impedance, the presence of abnormal heart sounds in addition to the normal sounds, are all metrics that have been shown to have a significant correlation with worsening heart failure. Thus, the baseline model performance is established as a linear combination of the physiological metrics extracted from the signals measured by our device. This approach prioritizes the explainability of the proposed model as the driving factor for future evolution and enhancement of the model performance.
“One of the challenges of machine learning is to be able to demonstrate the safety, understandability, and predictability of a learned model. Nanowear has shown how to address this challenge in a critical health care application” said Peter Norvig, Research Director at Google.
Machine learning techniques are most beneficial when subjected to a methodical curriculum of tests that validate the learning against clearly defined objectives. Following the implementation of a baseline performance model, there is scope to use other methods to learn new features. The first step that Nanowear took towards applying machine learning to our problem is to build a heuristics-based algorithm that utilizes all the domain knowledge of current chronic heart failure disease management. As we grow our data set, we use NNs to improve our algorithm’s performance.
A big thanks to the following leaders in helping to edit and publish this post:
- Peter Norvig, Director of Research @ Google
- Prashanth Kumar, Chief Technology Officer @ Nanowear
- Zachary Taylor, Chief Software Officer @ Nanowear
- Josh Cogan, AI Lead @ Google Launchpad
- Maya Grossman, Strategist @ Google Launchpad
- Jennifer Harvey, Marketing Lead @ Google Launchpad
- Michael Seiler, Content Lead @ Google Launchpad