Modular Machine Learning in Healthcare

It’s hard to imagine developing software that isn’t modular. It would be extremely difficult to debug, troubleshoot, and implement. Modularity is also an important consideration when deploying machine learning (ML) models, though for different reasons. The main benefits of modular ML over end-to-end systems are that the troubleshooting process is more straightforward, the modules are reusable across different use cases, modular ML systems generally require less data, and their interpretability can be a key feature in certain applications.

Byteflies is a startup that generates medical insights by processing the raw data obtained from Sensor Dots, small wearable devices that can record clinical-grade physiologic signals. Traditionally, this type of waveform data is processed with explicit rule-based approaches but medical wearables are ushering in an era where it is becoming normal to record thousands of data points every second on a single person, day and night. Consequently, there is much more noise in the data compared to short-term recordings under controlled circumstances and more robust approaches are needed to process this data. This is exactly where ML techniques can be incredibly powerful if used correctly. This prompts the question: What data processing architecture to use?

In this piece, we cover how Byteflies has leveraged modularity in data pipelines and ML models, some of the benefits of using a modular approach, and the process of integrating these modular components.

Let’s start with a practical example. The continuous assessment of lung function is invaluable for chronic respiratory disease patients to ensure they receive the best care possible. If we record cardiorespiratory vital signs, such as heart rate, respiratory rate, oxygen saturation (a measure of how effectively the lungs manage to oxygenate the blood), as well as measuring physical/behavioral activity, we can generate a very accurate measure of respiratory function without the need for expensive medical equipment or a visit to the hospital. But several factors complicate this process.

First, we record a signal that varies as a function of air in the lungs and stretch of the chest (bioimpedance for the experts). From this signal, which will be confounded by motion and speech artifacts, we calculate respiratory rate. Next, we derive heart rate from the electrocardiogram, oxygen saturation using optical techniques, and activity type classification from the inertial measurements. Each of these signals has its own set of idiosyncrasies and we did not even discuss the integration of these parameters into a single measure of lung function.

Traditionally, these processing pipelines were long, error-prone, and difficult to maintain. Knowing that, it sounds tempting to harness the power of deep learning by using the raw signals as input data for an end-to-end model that outputs “stable,” “worsening,” or “improving” lung function. If done well, this approach might well yield accurate predictions. However, we will argue why a modular approach makes much more sense for (wearable) healthcare applications.


The first benefit we saw in breaking the data pipeline into modular components was the ease of troubleshooting. Raw physiologic waveform data is incredibly feature-rich, which is both a blessing and a curse. Take for instance the diagnosis of cardiac arrhythmias (abnormalities in the electrical signal that makes the heart muscle contract) on electrocardiogram (ECG) data. When training an end-to-end model to classify raw ECG data, a very large dataset, high-quality labels, and careful feature engineering are necessary since arrhythmias are relatively rare events. If done well, the model can encode the relevant features of the ECG waveform and the relationships among them, which would result in a relatively deep neural net. Alternatively, we can train a simple pattern recognizer (convolutional neural nets, CNNs, are the method of choice) to focus on the most marked ECG feature (R-peak for the experts). Knowledge of the location of this feature allows us to segment the data and make educated guesses about the location of other features as their timing is bounded by physiologic rules. Consequently, the topology of those models can be even simpler.

Less (Labeled) Data

The hybrid approach (modular models chained together by a number of explicit rules) introduced above, significantly reduces the amount of labeled training data needed while preserving many of the benefits for using ML models in the first place. Labeling of medical and physiologic data generally requires highly trained experts which can be costly, especially for a company like Byteflies that is building a platform that needs to accommodate a wide range of applications.

Continuing the ECG example, arrhythmias are simply a combination of changes in timing and morphology of specific features that are very well known in medicine. Therefore, we can uncouple the problem of arrhythmia detection from the precise structure of the training waveform data, and focus on the timing of these specific features. Once we trained reliable and simple detectors for basic ECG features, which have neither massive data nor complex labeling requirements, we end up with a set of features per cardiac cycle. These feature vectors can be recombined to generate additional “artificial” training data for every possible arrhythmia known to cardiology.

Another way to think about this is to imagine again an end-to-end ML system. In that case, enough expert-labeled data is needed for the model to be able to “learn” many of the physiologic dependencies effectively from scratch. Medical science already has a great deal of knowledge about specific pathologies, and a modular structure increases the ability to pass this knowledge to the models.

ECG and respiration data recorded with a Byteflies Sensor Dot displaying Cheyne-Stokes respiration, another example of a specific pathological pattern in physiologic waveform data.


A third major advantage is that the separate modules can be chained together in new analytics pipelines. Consider the example where we trained a simple model to detect the most marked ECG feature (R-peak), from which we can measure the heart rate. Heart rate is a vital sign with many different applications; retraining end-to-end models to incorporate a continuous notion of it would become cumbersome very quickly. Remember, although heart rate on its own is an important vital sign, its true value comes from integration with other vital signs and behavioral patterns to generate digital biomarkers (measurable indicators of health or disease, measured via digital means, that respond to a therapeutic intervention).

Secondly, consider a rare cardiac pathology that dramatically changes the morphology of one or more ECG features. The feature engineering balancing act when trying to train or improve an end-to-end model is likely to take up a lot of time and might also increase the likelihood of overfitting. The modular approach allows us to improve the existing simple feature detector model. It will still require careful consideration of the training data but evaluating the model performance will be very straightforward. This relates to the next section: interpretability.


To drive acceptance in the medical and regulatory community for these type of analytics pipelines, clinical validation and understanding how the classification result was generated, is crucial. If the output of a model that consists of multiple modules chained together does not match the expertise of a medical professional, she can simply trace back down the chain to understand how the model arrived at that conclusion in the first place. Returning to our respiratory function example, a downstream integrator model may trigger a warning for a decline in function because that person was exercising. Although that in itself is valuable medical information, the modular approach allows her to verify the outputs of the individual modules (heart rate, respiratory rate, oxygen saturation, and activity) to note that behavioral context. That level of granularity will not be provided by an end-to-end model.

Specifically for wearable health, we have found that a more modular data/ML pipeline has several advantages over an end-to-end approach, at least for the foreseeable future. This debate is far from settled, as discussed in this excellent post. While a modular ML approach has worked well for Byteflies, our goal is to give the reader the context to decide what makes the most sense for his/her specific application. The healthcare sector is going through dramatic changes that will only accelerate over the years to come. Modular solutions will be one of the tools to drive adoption by the medical community to create real, positive patient impact.

Byteflies is a Belgian-American, AI-first medtech company on a mission to make healthcare truly personalized, predictive, preventive and participatory. We built a platform to instantly set up wearable health solutions that consists of our versatile and multimodal wearable device, Sensor Dot, infrastructure to manage and deploy wearable devices to thousands of patients, and analytics to transform raw physiologic data into digital biomarkers and clinical endpoints.

We work with pharmaceutical companies that are embracing clinical trial innovation and the value-based healthcare model, researchers looking for cutting-edge technology to monitor patients, and tech companies wanting to quickly push new wearable technology to the market.

Byteflies participated in the first class of Google Developers Launchpad Studio, a cohort focused on the applications of ML in healthcare and biotech.