Using Machine Learning to Predict Outcomes for Sepsis Patients

Ricardo Balduino
IBM Data Science in Practice
7 min readApr 30, 2018
a stethoscope
Shutterstock image

Authors: Ricardo Balduino, Brittany Bogle, Anna Hazard, Hosam Farag, Vida Abedi, Donna Wolk, Shravan Kethireddy, Vinay Rao Dandin, and Avijit Chatterjee

Introduction

Sepsis is a is a life-threatening condition that arises when the body’s response to an infection injures its own tissues and organs. It is a complex syndrome that is difficult to identify early, as its symptoms, such as fever and low blood pressure, overlap with those of other common illnesses. Without timely treatment, it can progress to septic shock, which has a hospital mortality rate greater than 40%.

Understanding which sepsis patients are at the highest risk for death could be useful for clinicians in prioritizing care. Our team partnered with researchers from Geisinger Healthcare System to build a model to predict in-hospital or 90-day post-discharge all-cause mortality among hospitalized sepsis patients using historic electronic healthcare record (EHR) data. This model could provide guidance to medical teams on careful monitoring and preventive measures possible for those patients that have a high probability prediction of death.

Data Science Environment

We used IBM Watson Studio (see here) for our work on this project. This product is a collaborative environment with the tools needed to ingest, visualize, and build models with heterogeneous data sources. It provides data scientists with choices among the most popular languages (Python and R) and gives the choice of Jupyter notebooks or JupyterLab, while also including IBM value-added functionalities and data science community features.

IBM Watson Studio operationalizes models for real-time or batch scoring and consumption by business applications. It also has the capability to integrate a feedback loop for continuous model monitoring and re-training.

Gathering and preparing data

Geisinger provided de-identified data files on over 10,000 patients diagnosed with sepsis between 2006 and 2016. These patients were either admitted to the hospital with sepsis or acquired sepsis during hospitalization. The data included demographics, inpatient and outpatient visits, surgical procedures, medical history, bacteria cultures, medications, transfers between hospital units, social history such as tobacco and alcohol use, vital measures, and lab results.

Per patient, we selected the most recent hospitalization and associated data from the various sources for that hospitalization. This included specific information on events during the hospitalization, such as the type and location of surgery and the culture location and bacteria found from cultures. We also derived summarized information on events preceding the hospitalization, such as the number of surgical procedures 30 days prior to hospitalization. No data after discharge was used. Figure 1 summarizes these time-based decisions.

Image showing time-based decisions for data used and for predictions, which includes data prior to discharge from the hospital.
Figure 1 — Time-based decisions for data used and for predictions

After combining the provided data sets, the resulting data set included 10,599 rows, one per patient, and 199 attributes, or features, per patient.

Predictive Model

After cleaning the data and applying feature selection, we defined our objective as a binary classification problem: predicting death during hospitalization through 90 days after discharge among sepsis patients.

For the algorithm to be used, we selected gradient boosted trees using the XGBoost package, which has been dominating popular machine learning competitions given its execution speed and robust performance. Another motivation for using XGBoost is the ability to fine tune hyper-parameters in order to improve the performance of the model. Within the training data, we used 10-fold cross validation and GridSearchCV to select parameter values in an iterative manner to maximize the area under the ROC curve (AUC). A practical example of this process in IBM Data Science Experience is here.

We split the data into training (60%) and testing (40%) sets. Using the tuned hyper-parameters from the training data, we applied the model to the test data, which resulted in the following model performance seen on Figure 2.

Image showing the performance of our XGBoost model, which performed well in prediction the outcome of sepsis patients.
Figure 2 — Performance of our XGBoost model

We looked at a variety of metrics to measure success for our model, which is maximizing the prediction of mortality among sepsis patients.

For the AUC (Area Under the Curve) score, the closer this number is to 1, the better a model’s ability is to correctly classify True Positive (TP) predictions while minimizing false positives. With an AUC of 0.8561, our model (during tests) was able to identify the vast majority of patients with sepsis who would die, so those patients could be targeted with adequate treatment.

Another way to look at precision and recall is with a Precision-Recall Curve (Area Under the PR curve). The closer to 1.0 this number is, the better a model can achieve a balance between precision (a.k.a. Positive Predictive Value) and recall (a.k.a. Sensitivity). In our case, the number was 0.80. We favored high recall — the intent was to minimize the number of patients missed by this model who could eventually die due to sepsis.

Another metric we used was the model’s accuracy. We used bootstrapping to generate 1000 variations of training and testing data sets, running the XGBoost model on them, and obtaining the model’s accuracy for each run. The distribution of the bootstrapped accuracy over 1000 runs gave us a 95% Confidence Interval on accuracy between 0.77 and 0.79, which means our model was able to identify over three quarters of the true results (both True Positives and True Negatives).

Image showing the accuracy of predictions of death for patients hospitalized with sepsis
Figure 3 — Positive and negative predictions

In addition to the numbers and their interpretation explained above, the Confusion Matrix for this model is seen in Figure 3. For the test data, it shows that our model identified 1,190 patients as True Positives (prediction of death for patients who actually died) and 2,087 True Negatives (prediction of survival for patients who actually survived).

We also used XGBoost’s capability to determine features importance using the “cover” parameter. This capability does not inform whether that feature is a strong predictor of death or strong predictor of survival, but the information generated by XGBoost is still very useful, as we can see the expected percentage of patients for which that feature is used in predicting death.

For example, as seen in Figure 4, the Age at hospital admission feature is used by 29.5% of patients to predict death.

Image showing the feature importance of the top 20 features in the final model
Figure 4 — Feature importance of the top 20 features in the final model

We conducted further exploratory analysis to examine how features are distributed with respect to the outcome variable (death). While these plots are helpful to visualize a high-level relationship with the outcome, it’s important to understand that XGBoost trains multiple decision trees, which are non-linear in nature. Due to this, important features in an XGBoost model may not have an obvious relationship with the outcome variable in these exploratory plots.

Image showing patients deaths related to some of the most important features, such as hours on vasopressors and age at admission
Figure 5 — Patients deaths related to some of the most important features

For example, as seen in Figure 5, a feature such as Age at hospital admission may suggest that older patients have a higher proportion of deaths when compared to younger patients. Another example, the Hours spent on Vasopressors feature may suggest that patients who took vasopressors longer had higher death rates, but these deaths could as well have been due to the severity of their health condition (for example if the sepsis condition evolved to a septic shock), thus requiring them to be on vasopressors for a longer duration.

The decision tree rules outputted by XGBoost can be used to help further understand how to target patients for treatment. For example, the medical team may provide special attention to older patients due to their higher mortality risks, may monitor the duration of vasopressors taken, and may try to reduce the number of patient transfers between hospital departments in order to minimize the impact on susceptible patients, and so on.

Conclusions

Predicting all causes of death in sepsis patients can guide health providers to actively monitor and take preventive actions to improve patients’ survival. Many of the features that were identified as important in our model are known to be associated with sepsis patients’ death. This provides reassurance that our machine learning model can help identify well-known associations with sepsis death even among the noise of many unrelated variables. However, in this analysis we excluded features from key data sources that had a lot of missing data, including lab results and vitals. We expect the model performance to improve by adding those features later. We will continue our collaborative work with Geisinger to analyze an updated and more comprehensive set of clinical variables and continue to further improve our model and its clinical utility. With more interventional features, we hope to produce a more actionable model that can assist Geisinger in their care for sepsis patients.

Acknowledgements

Many thanks to the IBM Academy of Technology and our IBM executive sponsor Rob Thomas for approving this initiative, where we had participation on our weekly calls from Debdipto Misra, Bipin Karunakaran, Rameswara Sashi Challa, and Satish Kalepalli from Geisinger, and Shantan Kethireddy, Aleksandr Petrov, Wanting Wang, Rajiv Joshi, Cheranellore Vasudevan, Alan Newman and Vidhya Shankar from IBM.

--

--