Artificial intelligence in leukemia


figure source: here

In this third article, we focus on other tasks and other type of data exploited in deep learning (DL) in hematology. As aforementioned, deep learning can be useful in many tasks related to leukemia. In the precedent review we focused on image data analysis. Here we will consider different other data sources and application such as therapy selection, differential diagnosis, risk predictions (1).

As an example, medical diagnosis in hematology is based on blood test, where clinicians focus on values that fall outside a reference range. This is leading to miss patterns and also relationships between parameters. With increasing the amount of tested parameters, the diagnosis relies on the clinician experience and knowledge.

Machine learning algorithms are perfectly suited to find patterns and can easily handle an increasing number of parameters. Predictive models based on machine learning have many advantages. One of the advantages is that traditional models are biased by choosing variables, while machine learning are often trained to the data ensemble without prior selection. Machine learning algorithms are also taking in account interactions and confounding factors, which is much more challenging with traditional approaches. Another application is risk stratification, which is important for balancing the advantages of therapy and potential side effects.

Machine-learning models can be beneficial because an accurate prognostication model remains challenging. As an example, in chronic lymphocyte leukemia a risk stratification based on a machine learning model can be helpful in the decision between watch-and-wait approach or initiating a therapy. Better risk stratification of the patient is important also in the design of trial (which in cascade as a deep impact in the development of new therapies).

Shouval designed a machine algorithm to predict mortality after allogenic hematopoietic stem-cell transplantation (HSCT). HSCT is a therapy considered for different hematologic malignancies, while in the last years there was an improvement in recovery, morbidity and mortality (around 30 %) are still considerable.

Indeed, it is an important challenge to decide for which patient the treatment would be beneficial. Different scores have been developed to aid clinicians in patient selection, but still their predictive accuracy is not optimal. In their study, they used 29685 patients from 404 centers and considering 20 variables (which are describing donor, recipient and procedural characteristics). They used as a model an alternating decision tree to predict an overall mortality at day 100. In this model there are decision nodes associated with prediction weight, for each patient is calculated a cumulative score (which is transformed into an individualized probability of 100-day survival using a logistic regression model).

This machine learning approach outperformed score designed with a classical approach (2). One of the most critical complications after HSCT is acute graft-versus-host disease (aGVHD). To predict this adverse event many models have been proposed based on Cox’s proportional hazard models and logistic regressions.

With this purpose, Arai tested five machine learning algorithms: Naïve Bayes, alternating decision tree, multilayer perceptron, random forest and adaptive boosting. Scope of their work was to predict aGVHD using a retrospective cohort of almost 27000 patients. They evaluated their prediction using area under the curve (3).

Agius developed a machine-learning model to predict chronic lymphocytic leukemia (CLL) patients at high-risk of infection. Infections are a major cause of mortality in CLL patients while predictive models are missing. Their approach took in account 7-years of patient history prior of CLL diagnosis (laboratory results, infectious disease and comorbidities).

Generally, proposed models are focused on direct disease parameters while a multiple outlook approach showed benefices. They also used an ensemble approach where they combined different models to build a classifier. The aim is building an ensemble classifier that is more accurate in prediction than any of the classifiers present in the ensemble. In clinical data, this approach showed to be less prone to overfitting than using a single algorithm. Agius incorporated 28 independent machine learning algorithms (starting from linear and non-linear types of classifiers) in a single one (4).

With the development of genomic (or similar source, like transcriptomic and/or proteomic) and evolution of these techniques, which has decreased the cost, a huge amount of data is becoming available. On these data many machine learning algorithms with the aim of tumor classification, therapeutic data identification, gene classification, outcome prediction.

Gal proposed a model to predict complete remission in AML, starting from RNA-seq sample of 493 patient cohorts. They used principal component analysis (PCA) to remove outliers and then they used r feature selection (selecting the 100 more statistically significant genes). They compared three different machine learning algorithm to predict 2 years complete remission in young patients (5).

Guncar used as input for its model data from 370000 blood test performed on more than 8000 patients (it considered patient characteristics and parameters obtained from the laboratory tests). They used as model a support vector machine (SVM) to predict hematologic disease. They build two models one based on the all available parameters measured and a reduced model which considered only the parameters measure at the patient admittance (6).

Another interesting approach was proposed by Biccler, they start from medical registry containing patient clinicopathological features treatment information and the outcome. They used the danish registry as training set and the Swedish registry as test set. They used as method the stacked survival model (superlearning derivate). This model outperformed previous established prognostic index, and more important the results were reproducible in an independent cohort (7).

Gandelman proposed instead a model based to an unsupervised learning approach for risk stratification for GVHD. They used t-distributed stochastic neighbour embeddings and a clustering algorithm to define clusters of patients. Their work identified cluster of patients with clinical difference in organ system involved in the disease (8).

Figure 1. patient cluster obtained through machine learning algorithm; the organ score showed difference in the cluster disease significance. Figure source: (8).

IBM proposed Watson for diagnosis in oncology, which use deep learning technique coupled with natural language processing to combine patient and disease features, experiences by oncologist and available literature and clinical trials to suggest treatment option in a ranked manner. Watson can access physician notes and extract information from electronic health note stored as free text. Indeed, Watson has not matched the high expectation but the field is still at its early dawn (9).

Risk stratification can be achieved starting also from medical images, as an example using PET-CT images to stratify patients. As an example, Milgrom and colleagues used PET data for predict relapse or treatment refractory disease. Their model predicted also the outcome, who survived to the disease and who died from the malignancy (10).

Machine learning can also aid in the patient treatment decision. Many of the patients are not responding to first line treatment, and second line treatment is considered only after first line has failed (disease progression). Machine learning can help in considering in the upfront decision of the treatment. As we discussed in the previous section, there is interest in epigenetic agents (as example hypomethylating agents) but these therapies can need months to show a response. Machine learning can aid in identifying patients who would benefit from the therapy, basing in early changes in the blood population count (11). A dynamic model that follow treatment response in the patient can be useful to predict response and outcome, and thus avoid potential useless therapy and side effects in the patient.

Precision medicine needs the understanding of patient specific biology, which are the driver of the disease, which is a patient specific treatment. The basis of this personalized medicine is elusive since there are many complex factors that are interacting. Next generation sequencing is driving a revolution, allowing to gathering personalized data in an extensive manner. However, this great amount of data is described as “data-rich, information-poor”, because is difficult to exploit and interpret it. The decrease of the cost of such techniques (genomic or similar source, like transcriptomic and/or proteomic) and evolution of these techniques has worsening the issue, we have a huge amount of data that needs to be exploited.

On these data many machine learning algorithms are in study with the aim of tumor classification, therapeutic data identification, gene classification, outcome prediction. Gal proposed a model to predict complete remission in AML, starting from RNA-seq sample of 493 patient cohorts. They used principal component analysis (PCA) to remove outliers and then they used r feature selection (selecting the 100 more statistically significant genes). They compared three different machine learning algorithm to predict 2 years complete remission in young patients (5).

Another approach was based on recommendation system for therapy based on mutations and cytogenetic abnormalities (12). From RNA-seq and mutational data Lee and colleagues derived a model to predict drug sensitivity in AML (13). Flow cytometry has been used for hematological malignancy diagnosis and in the last years technical advances are allowing to register much more parameters. Multiparametric flow cytometry showed an high dimensional increase in the obtained data that is leading to new computational approaches which include machine learning (14).

Proteomics data were used for diagnosis, as an example mass spectrometry of peripheral blood plasma was used to diagnosticate multiple myeloma and the classification was achieved using neural network (15).

Another example of application of machine learning in oncology is data mining, an approach to discover knowledge in database in automatic manner. Many algorithms have been developed to data mine medical records from clinical file to scientific literature.

Since machine-learning applications in medicine are fast-growing and every year are proposed numerous new models, standardization is becoming a key need. There are many questions that have to be addressed in an article which is reporting a new model such as transparency, reproducibility and a fair comparison to previous models.

Information about the methodology and samples are fundamental to evaluate the quality of the work. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) is an initiative directed to standardize model development and validation. TRIPOD initiative describes a checklist of 22 items that should be included in a study that is reporting a multivariable prediction model for diagnosis or prognosis (16). On the same idea, there is an initiative to launch TRIPOD-ML, guidelines specific to methodologies in the case of machine-learning.

Authors raised concerns about methodology in deep learning applied in medicine such as a focus in classification over prediction, overfitting, lack of robust assessment of accuracy, weak comparison with simpler models, poor transparency which limits reproducibility and independent evaluation. Moreover, these new machine learning approaches are proliferating and substituting other approaches which have clearly defined methodological guidelines.

Aim of TRIPOD-ML is also to harmonize the terminology used in the field while establish guidelines that encompass deep learning in health care (17). On the same direction, CRISP-DM initiative proposed guidelines for data mining studies.

Figure 2: An example of framework for machine learning in hematology. Figure source: (1).

To be clinically relevant a machine learning algorithm needs a clear hypothesis. It is important to explore in advance the available data source and the possible problems that can be encountered. In the frame of biological data, data quality control is critical (completeness and the accuracy of the data) and a strongly need to avoid biases. Machine learning model are heavily influenced by the quality of the source data.

Pre-processing is critical when handling biological data and it is a challenging task. Omics data require particular expertise, especially in the integration of multiple sources. Omics are high dimensional data, where you can often encounter missing data and feature selection can be a key process. In this view, it is important to choose the appropriate machine learning model that best suit the proposed task.

Traditionally, machine learning model used in biology are regression or classification problems. Traditional machine learning technique fails in survival analysis, since is a special a case. Survival analysis have many applications in medicine. Survival prediction requires that the model has to take in account for the time to the event (death, relapse of the disease) and for censoring (for instance if the patient is not anymore followed by the clinician or the missing event during the time).

Cox’s regression (which associate various risk factors to the survival) is the most widely used method, but omics data shows a huge number of features in comparison to the sample size and are required new methods to handle it. As an example, penalized Cox models have been developed for handling genomics and overcome the high-dimensional problem. Penalized Cox models showed limitations, in this view machine learning solution have been proposed. In short, different research group proposed survival SVM and survival trees (and derivation as random survival forest).

Another approach was to exploit neural networks, like Cox-nnet. It is also important to decide the most suitable method and framework to check the generalization. This is important in biology, because over-fitting or bias can lead to unexpected association (as an example risk factor associated to poor survival) that may not stand in a new dataset. Ideally, the test set in biology should be an independent dataset (as example an independent cohort from an external source).

For evaluation, one of the metrics widely used is the concordance index (or c-index). C-index consider the probability that randomly selecting a pair of individuals (one with the defined outcome and the other without), the probability assigned by the model is higher for the individual with the outcome. C-index is equal to AUC if the model has binary endpoints. Sensitivity, specificity and other metrics are also considered, the choose of the right evaluation metric is a key point in evaluating biology models (1).

Figure 3: steps in machine learning model in medicine. Figure source: (18)

Data availability is one of the limitations that can be encountered. Machine learning algorithms require a high amount of data, and in biology (especially with omics) your dataset can present much more features that observation. Neural networks particularly complex (with many hidden layer) can easily go overfitting in this context.

Omics data are expensive to obtain (even if costs are reducing it is expensive to obtain large dataset), patient recruitment can be complex, and thus limiting the data source. Some of the disease are rare, so patient recruitment is even for more complex. These limitations can lead to a limited sample size which is translated in poor model performance and overfitting. To solve these issues, regularization, data augmentation and transfer learning have been proposed. Moreover, medical data records where hand written and only since few years are electronically recorded.

Bias have been often encountered in medical machine learning models, for example well documented racial and economic bias in the data are detrimental for many patients. Interpretability can help to avoid bias, or discover the source of similar bias which is important for a model proposed for a clinical use. Data source can be various and of various quality or contain missing dat. Some of the data source are not well registered or poorly accurate.

Another limitation is that can be acquired in a different manner by different centers (different standards) or change in the feature classification. This especially the case when the feature registration is dependent by the clinician eye (difference in experience can change the registered output). Interpretability is still a problem for machine learning models, especially neural network, which makes hard to follow the association between predictors and outcome (1).

Moreover, when author propose a model, they should focus not only a traditional metrics but also to explain in which condition they model are less reliable. This is important since in biology author can encounter class imbalance (a patient outcome is prevalent; a disease is rare). In conclusion, both researcher than clinician in evaluating a study they should carefully read and ask themselves different questions (19).

In perspective, machine learning models have many potentials in the field. Starting from aiding in the design of new trial. Patient enrolment is crucial factor in the trial success; thus, machine learning can improve patient selection (define the appropriate subset of patient at risk, querying the medical record for suitable candidates, identify the population who might best suit the treatment, a more appropriate monitoring).

This will be translated in reduce trial size (with concomitant reduction of costs and time) and avoiding enrolment of patient that cannot benefit from the treatment. Moreover, with machine learning model becoming more accurate and sophisticated it is opening the possibility to abandon patient risk categories and consider prognostic prediction at individual patient level. For this application we need large genomic database to be matched with clinical data.

  • Previous article on Acute Myeloid Leukemia in general: here.
  • Previous article on medical image analysis with machine learning: here.

Selected bibliography:

1. Shouval R, Fein JA, Savani B, Mohty M, Nagler A. Machine learning and artificial intelligence in haematology. British Journal of Haematology [Internet]. [cited 2020 Oct 5];n/a. Available from:

2. Shouval R, Labopin M, Bondi O, Mishan-Shamay H, Shimoni A, Ciceri F, et al. Prediction of Allogeneic Hematopoietic Stem-Cell Transplantation Mortality 100 Days After Transplantation Using a Machine Learning Algorithm: A European Group for Blood and Marrow Transplantation Acute Leukemia Working Party Retrospective Data Mining Study. JCO. American Society of Clinical Oncology; 2015;33:3144–51.

3. Arai Y, Kondo T, Fuse K, Shibasaki Y, Masuko M, Sugita J, et al. Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation. Blood Adv. 2019;3:3626–34.

4. Agius R, Brieghel C, Andersen MA, Pearson AT, Ledergerber B, Cozzi-Lepri A, et al. Machine learning can identify newly diagnosed patients with CLL at high risk of infection. Nat Commun [Internet]. 2020 [cited 2020 Oct 5];11. Available from:

5. Gal O, Auslander N, Fan Y, Meerzaman D. Predicting Complete Remission of Acute Myeloid Leukemia: Machine Learning Applied to Gene Expression. Cancer Inform [Internet]. 2019 [cited 2020 Oct 5];18. Available from:

6. Gunčar G, Kukar M, Notar M, Brvar M, Černelč P, Notar M, et al. An application of machine learning to haematological diagnosis. Sci Rep [Internet]. 2018 [cited 2020 Oct 5];8. Available from:

7. Biccler JL, Eloranta S, de Nully Brown P, Frederiksen H, Jerkeman M, Jørgensen J, et al. Optimizing Outcome Prediction in Diffuse Large B-Cell Lymphoma by Use of Machine Learning and Nationwide Lymphoma Registries: A Nordic Lymphoma Group Study. JCO clinical cancer informatics. 2018;2:1–13.

8. Gandelman JS, Byrne MT, Mistry AM, Polikowsky HG, Diggins KE, Chen H, et al. Machine learning reveals chronic graft-versus-host disease phenotypes and stratifies survival after stem cell transplant for hematologic malignancies. Haematologica. 2019;104:189–96.

9. Malin JL. Envisioning Watson As a Rapid-Learning System for Oncology. J Oncol Pract. 2013;9:155–7.

10. Milgrom SA, Elhalawani H, Lee J, Wang Q, Mohamed ASR, Dabaja BS, et al. A PET Radiomics Model to Predict Refractory Mediastinal Hodgkin Lymphoma. Sci Rep [Internet]. 2019 [cited 2020 Oct 7];9. Available from:

11. Radakovich N, Sekeres MA, Hilton CB, Mukherjee S, Shreve J, Rouphail Y, et al. Predicting Response to Hypomethylating Agents in Patients with Myelodysplastic Syndromes (MDS) Using Artificial Intelligence (AI). Blood. American Society of Hematology; 2019;134:2089–2089.

12. Madanat YF, Sekeres MA, Mukherjee S, Hirsch CM, Guan Y, Nagata Y, et al. Genomic Biomarkers Predict Response/Resistance to Lenalidomide in Non-Del(5q) Myelodysplastic Syndromes. Blood. American Society of Hematology; 2018;132:1797–1797.

13. Lee S-I, Celik S, Logsdon BA, Lundberg SM, Martins TJ, Oehler VG, et al. A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nat Commun [Internet]. 2018 [cited 2020 Oct 7];9. Available from:

14. Duetz C, Bachas C, Westers TM, van de Loosdrecht AA. Computational analysis of flow cytometry data in hematological malignancies: future clinical practice? Current Opinion in Oncology. 2020;32:162–9.

15. Deulofeu M, Kolářová L, Salvadó V, María Peña-Méndez E, Almáši M, Štork M, et al. Rapid discrimination of multiple myeloma patients by artificial neural networks coupled with mass spectrometry of peripheral blood plasma. Sci Rep [Internet]. 2019 [cited 2020 Oct 7];9. Available from:

16. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine. American College of Physicians; 2015;162:W1–73.

17. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet (London, England). 2019;393:1577–9.

18. Radakovich N, Nagy M, Nazha A. Machine learning in haematological malignancies. The Lancet Haematology. 2020;7:e541–50.

19. Faes L, Liu X, Wagner SK, Fu DJ, Balaskas K, Sim DA, et al. A Clinician’s Guide to Artificial Intelligence: How to Critically Appraise Machine Learning Studies. Transl Vis Sci Technol [Internet]. [cited 2020 Oct 7];9. Available from:



Salvatore Raieli
Peter Moss Leukaemia MedTech Research CIC

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence