Journal Club Review: “Predicting Mortality Risk in Patients with COVID-19 Using Artificial Intelligence to Help Medical Decision-Making”

Published in

COVID Reviews

6 min readApr 9, 2020

The goal of COVID Reviews is to provide a series of reviews related to recent analytics-focused articles on COVID-19. Our hope is to provide short summaries and critiques in a journal club-oriented format that can be quickly digested to assess the methodologies used, populations/outcomes assessed, impact of work, and strengths/weaknesses of each article.

Article

Predicting Mortality Risk in Patients with COVID-19 Using Artificial Intelligence to Help Medical Decision-Making. medRxiv 2020.03.30.20047308; doi: https://doi.org/10.1101/2020.03.30.20047308

Review by: Bobak Mortazavi, PhD; Guannan Gong, MS; Wade Schulz, MD, PhD*

NA / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

Authors’ Aim

This article aimed 1) to design and developed a predictive model based on Artificial Intelligence (AI) and Machine Learning algorithms to determine the health risk and predict the mortality risk of patients with COVID-19 based on patients’ physiological conditions, symptoms, and demographic information; and 2)to create a predictive algorithm to help hospitals and medical facilities maximize the number of survivors by providing an accurate and reliable tool to help medical decision making and triage COVID-19 patients more effectively and accurately during the pandemic.

Why Article Was Selected

This article was selected because it was one of the first to develop an initial, accurate predictive model for mortality that additionally explores the risk factors. It serves as a foundation for understanding the successes and shortcomings of early predictive models that might assist in clinical decision making/triage of patients as stated in the paper’s introduction. This article was published with data made available to the community.

Dataset: More than 117,000 laboratory-confirmed COVID-19 patients (Mainly from China) from 76 countries around the world including both male and female patients with an average age of 56.6.

Xu, B., Gutierrez, B., Mekaru, S. et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Nature Sci Data 7, 106 (2020). https://doi.org/10.1038/s41597-020-0448-0 In: Nature.

Methods Employed by Authors

The authors developed a standard machine learning pipeline, carrying out initial feature extraction, then comparing the performance of a number of models. They extracted 112 features for their model development. 80 of these features came from each patient’s health status, while the remaining 32 were demographics and physiological data. Because of the severe imbalance of the dataset, they randomly subsampled the patients who survived to develop a balanced dataset. From this set they then developed univariate and multivariate filters in order to select the best feature subset for model development. Authors selected a final subset of 42 features:

Demographic features such as age, sex, province, country, age, travel history,
General medical information such as comorbidities (diabetes, cardiovascular disease, … ),
Patient symptoms such as chest pain, chills, colds, conjunctivitis, cough, diarrhea, discomfort, dizziness, dry cough, dyspnea, emesis, expectoration, eye irritation, fatigue, gasp, headache, lesions on chest radiographs, little sputum, malaise, muscle pain, myalgia, obnubilation, pneumonia, myelofibrosis, respiratory symptoms, rhinorrhea, somnolence, sputum, transient fatigue, weakness, etc.

After extracting the features and creating their dataset, authors employed several common machine learning methods for predicting mortality: Support Vector Machine (SVM), Artificial Neural Networks, Random Forest, Decision Tree, Logistic Regression, and K-Nearest Neighbor (KNN). To evaluate the developed model, they then conducted a 10-fold random cross-validation (with no overlap, with no replacement) and measured overall accuracy as well as the area under the receiver operating characteristic curve (AU-ROC) and provided a Confusion Matrix for threshold-based classification.

Results and Conclusions

Their best performing model had an overall accuracy of 93% when predicting mortality (using neural networks) though logistic regression provided the best discrimination with an AUROC (c-statistic) of 0.95. This suggests that a different threshold for classification may have provided a better overall accuracy for logistic regression than neural networks.

An analysis of the important features showed that age and a number of comorbidities were the most important factors in determining mortality risk. Additionally, authors showed that a number of these factors had strong correlations with each other, so it is not clear if each factor individually would provide strong performance or if the combination resulted in added risk.

The developed algorithms were able to accurately predict the mortality risk in patients with COVID-19 based on the patients’ physiological conditions, symptoms, and demographic information. This system may help hospitals, medical facilities, and caregivers prioritize COVID-19 patients, particularly with the extra discrimination power provided by the remaining features. This study could expand to other diseases to help the healthcare system respond more effectively during an outbreak or a pandemic.

Strengths of Article

The article addresses an important problem, that of determining early on which patients might require extra resources or attention during the current strained pandemic situation. This article used a relatively large dataset and selected many related features to train the predict models, and accuracy and performance were measured for each model. Further, the authors did a good job of providing some explanation of the features of importance to guide reproduction of such a model, as well as to explain the potential performance of more complicated techniques such as neural network-based models.

Limitations of Article

A number of limitations exist in this paper , primarily due to the method in which features are pre-selected and models trained on balanced datasets.

there are limitations on the data source used — though this is no fault of the authors. The availability of a wider array of feature data was limited, and therefore, the common features of age and comorbidities were selected as most important, as expected.
In building their models, authors use 10-fold cross-validation on a balanced dataset, which may not be the most effective way of dealing with the data imbalance. In particular, such a technique should be repeated so as to ensure the downsampling of the negative class does not potentially choose patients that are very easy to discriminate from those that have high mortality risk.
They also carry out their feature selection prior to model development — essentially identifying features that have direct correlations to the outcome of interest.
This limits the feature importance claims the paper makes and it is not clear if later treatment data potentially confounds these results — in terms of high risk people who survived versus did not.
Their procedure of creating additional features is not necessarily clear (consulted with clinical collaborators), and therefore difficult to replicate or understand the definitions used for those variables and the sources of that data.
The hyperparameter tuning used for the grid searching could use better explanation. What were the ranges selected to search from, why set those ranges/step sizes, and do the authors thing they explored a reasonable search space for the hyperparameters?
For evaluating the results, accuracy isn’t the most informative measure for a situation like this. While AUROC additionally has performance issues with imbalanced datasets, it is more explanatory when data balancing is carried out, such as in this work. Focusing on AUROC, while providing the specific metric used to optimize the confusion matrix would provide better understanding of risk as well as provide a potential for a decision support marker/threshold. Ideally, authors would provide some form of calibration plot to understand the distribution of risk over the population tested.

Take-Home Message

This article is an important step in developing a model to predict the mortality risk in patients with COVID-19, and further providing a useful tool for healthcare systems to prioritize COVID-19 patients with limited resource. However, the reliability of the data source limits the accuracy of the analysis, and the procedure of creating feature set is not necessarily clear. Further, balancing the training data by randomly sub-sampling the negative class will potentially eliminate important patients especially when we have so little data, so if such a procedure is to be carried out, it should be repeated to show robustness to the random nature of the dataset creation. In all, this is a solid foundation to begin from but more advanced models with a wider array of data are needed.