A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer

Abish Pius
Computational Biology Papers
5 min readMay 13, 2023

Baptista, Delora, Pedro G. Ferreira, and Miguel Rocha. “A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer.” PLOS Computational Biology 19.3 (2023): e1010200.

Full Article: A systematic evaluation of deep learning methods for the prediction of drug synergy in cancer | PLOS Computational Biology

Overview

The use of combination therapies is a common strategy to overcome drug resistance in cancer treatment. Machine learning, particularly deep learning, can be used to discover effective anti-cancer drug combinations. In this study, the impact of different methodological choices on the performance of deep learning-based drug synergy prediction methods was examined using the NCI ALMANAC dataset. Feature selection based on biological knowledge improved performance, and drug features were found to be more predictive of drug response. Molecular fingerprint-based drug representations performed slightly better than learned representations. Fully connected feature-encoding subnetworks were the most effective model architectures. Deep learning outperformed other machine learning methods, and an ensemble of the top deep learning and machine learning models further improved performance. Interpretability methods demonstrated that deep learning models can learn biologically meaningful associations between drug and cell line features and drug response. These findings contribute to the development of computational methods for designing effective cancer drug combinations.

Background

The development of drug resistance is one of the main challenges in cancer treatment. Intratumoral heterogeneity, where subpopulations of cells with distinct characteristics emerge due to genomic instability, can lead to the selection of subpopulations that favor drug resistance under treatment. Combining multiple treatments can help to reduce drug resistance and may result in drug synergy, where the combined effect of drugs is greater than expected. High-throughput cell viability assays are used to discover novel effective anti-cancer drug combinations, but screening all conceivable drug combinations is infeasible. Therefore, computational methods such as biological network analysis-based approaches and machine learning (ML) methods can be used to reduce the search space. ML, in particular, has been used to model the response of cells to drug combinations and can predict drug synergy based on drug combination screening experiments and other relevant data. Deep learning (DL) approaches, which can learn complex, non-linear functions and do not require extensive feature selection, have attracted interest from researchers in this field. Several publicly available high-throughput drug combination screening datasets and large-scale cancer cell line genomics and transcriptomics datasets can be used to develop drug synergy prediction models.

Results

The study used the ALMANAC dataset and developed multiple DL models and machine learning (ML) models.

Baseline models were created as references for subsequent models. All models performed better than a random baseline model that predicts the average ComboScore value of the training set. To assess the importance of different input data types, models were trained with different combinations of one-hot encoded cell line and drug identifiers, as well as omics and chemical features. The results showed that both drug and gene expression features contributed to the predictive capacity of the models, with drug features appearing to be more predictive of drug synergy than omics features.

Different gene expression subnetworks were evaluated, and models with fully connected gene expression subnetworks trained on log-transformed and min-max scaled Fragments Per Kilobase of transcript per Million mapped reads (FPKM) values performed the best. Feature selection methods using smaller gene lists produced models with similar or higher performance scores compared to using the full set of protein coding genes.

For drug encoding networks, the model trained on extended connectivity fingerprint (ECFP4) fingerprints outperformed other subnetworks. Other fingerprint-based drug encoding schemes and graph convolutional network (GCN) subnetworks also achieved good performance.

Including mutation and copy number variation (CNV) data in addition to gene expression and drug features slightly decreased model performance. Pathway-level mutation features performed slightly better than gene-level mutation data, indicating that more genes taken into consideration in the pathway-level data may preserve relevant genetic information.

Comparisons with other ML models showed that the DL model outperformed all ML models tested. The best non-DL models were light gradient boosting machine (LGBM) and random forest (RF). The performance of tree-based models (RF, extreme gradient boosting, and LGBM) was comparable to some of the lower-ranking DL models.

A heterogeneous ensemble that combined different DL architectures and ML models achieved better results than individual DL models, indicating improved generalizability of drug synergy prediction models.

The SHapley Additive exPlanations (SHAP) interpretability framework was used to determine the importance of features for the DL model. The results showed that drug features were the most important features for predicting drug response in the ALMANAC dataset. SHAP values were also analyzed for specific examples, revealing the contribution of each feature to the model’s prediction.

Top 20 most important features, ranked by mean absolute SHAP values.

Overall, the study found that including both drug and gene expression features improved the predictive capacity of the models. Drug features were identified as the most important for predicting drug response, while different DL architectures and ML models can be combined to enhance model performance.

Discussion

The study suggests that drug features are more predictive of drug combination effects than cell line features, indicating that the models primarily use gene expression data to distinguish between cell lines rather than identifying specific synergy biomarkers. Different compound representation methods showed similar performance, indicating that combining different types of drug representations could improve model performance. Utilizing prior biological knowledge for feature selection proved beneficial, and integrating pathway propagation methods or directly including biological knowledge in the neural network could further enhance predictive capacity. Training models on larger and more diverse drug combination datasets could improve generalization. The limitations include the limited clinical applicability of cell line screens, the need to assess drug combination sensitivity alongside synergy, and the challenge of interpreting the underlying mechanisms of drug synergy. Creating heterogeneous ensembles of DL and ML models improved performance, and alternative interpretability methods should be explored. Different validation schemes and cross-study evaluations are suggested for future research.

FREE PDF to Text CONVERTER Click here: Convert pdf to text for free!

FREE ChatGPT Document Q&A: Get questions answered about any document type of any length!

Plug: Please purchase my book ONLY if you have the means to do so, I usually do not advertise, but I am struggling to stay afloat. Imagination Unleashed: Canvas and Color, Visions from the Artificial: Compendium of Digital Art Volume 1 (Artificial Intelligence Draws Art) — Kindle edition by P, Shaxib, A, Bixjesh. Arts & Photography Kindle eBooks @ Amazon.com.

--

--

Abish Pius
Computational Biology Papers

Data Science Professional, Python Enthusiast, turned LLM Engineer