Radiomics: Predicting Survival Outcomes Based on CT Images of Lung Cancer Tumours Using Machine Learning Algorithms
Lung cancer has been one of the leading causes of cancer-related deaths in Canada because of its high incidence and low survival rate. Based on the 2012–2014 data, the 5-year survival rate for lung cancer is 19%, while lung cancer is one of the four most commonly diagnosed cancers in Canada. The most common type of lung cancer is non-small-cell lung carcinoma (NSCLC).
The survival outcomes are strongly dependent on the staging of the disease when it is diagnosed, but one can wonder if there are other factors at play. Is it possible to use diagnostic images that are often obtained as part of the treatment workflow and derive additional prognostic values from them? I tried to use various machine learning models to find out.
(1) Dataset
This may be the hard part since patient data is usually confidential. Fortunately, machine learning techniques are finding their way into diagnostic radiology. Radiologists and medical physicists are embracing the “open data” culture. I used the NSCLC data set from the Cancer Imaging Archive:
Along with the image sets, there is also a separate text file containing information on age, sex, staging, histology, and most importantly, survival time and survival outcome. All the CT images have also been contoured by oncologists, which ensures consistency in the accuracy. (I have never been an oncologist, so you can see how contours drawn by me will be problematic.)
(2) Data wrangling
Among the 422 CT image sets, 2 of them were found to have the contours not matching the tumour. I removed them from my analysis.
Then I used the Pyradiomics library — more specifically, the Pyradiomics plugin of the open-source radiotherapy analysis software SlicerRT — to extract the features.
https://pyradiomics.readthedocs.io/en/latest/
To remove some of the multicollinearity in the features, I used pair plots to visually inspect the features. Yes, there is a lot of them:
The criterion for leaving out features is if two or more features have a pairwise correlation coefficient larger than 0.95, then only one is kept.
The Seaborn pairplot function is nice to use because it also gives the probability distribution of the feature. This provides me with a way to skim and inspect any highly skewed distributions. If it looks highly skewed, then I would use either a logarithmic transform or, failing that, a Box-Cox transform to make the data less skewed. I used a QQ-plot to check for normalcy.
(3) Machine Learning — Methodology
I have divided my project into two parts:
(3.1) Supervised machine learning: I tried to use various classification techniques to predict whether the patient can survive past the 1-year, 3-year and 5-year mark. Originally I was considering doing a linear regression to predict the survival time, but a lot of the patients did not expire at the survival time, which means their definite survival time is not known. The classification problem instead would allow me to keep the patients who had survived past the specific point in time regardless of their final survival status (i.e. dead or alive).
I used a grid search and 5-fold validation to optimize the models. The models were:
- logistic regression with regularization
- support vector machine
- random forest
- quadratic discriminant
- decision trees
- XGBoost decision trees
- neural network
There was also a class imbalance problem, especially for the 3-year and 5-year classification problems. To rectify this, I used various flavours of oversampling methods (SMOTE and SVMSMOTE). The oversampling method was also incorporated in the grid search optimization pipeline.
To eliminate some of the noise, I also used principal component analysis to compare the effect that the number of principal components has on the performance of the model.
(3.2) Unsupervised machine learning: I constructed tumour types by clustering based on their features in the CT image. Then I used the Kaplan-Meier estimator to compute the median survival time for each cluster. This had the advantage of keeping all patients in the analysis since the Kaplan-Meier estimator could handle censured events. After clustering the data, I looked at the features that were most statistically different in their distributions across the clusters by ranking their p-values from smallest to biggest.
I used k-means clustering and OPTICS clustering (a density-based clustering algorithm). How did I determine the number of clusters for k-means clustering? I looked at the inertia and looked at the “elbow point”, and then I played around since two clusters had their Kaplan-Meier curves almost overlapping each other.
(4) Results
Supervised learning model:
I hate to sell myself short, but the findings here are not terribly exciting:
Although we see some models having an accuracy of more than 80%, bear in mind that this is mainly due to class imbalance. For the 5-year classification problem, for instance, the class imbalance was about 80:20 because most of the patients, unfortunately, did not survive that long. SMOTE and using f1-score to optimize the model did not help.
Unsupervised learning model:
OPTICS yielded one main cluster and more than 400 outliers, so it was not particularly useful. K-means yielded three clusters at the end:
If we look closely, the blue curve lies above the orange curve significantly (i.e. their confidence bands do not overlap for some part of the curve). That is promising. Indeed, the 1-year and 3-year survival probabilities for the blue and orange clusters are significantly different (p < 0.005). Then I ran a t-test for each of the features in the two clusters and looked at the p-values. It turns out the feature that is most different between the blue and orange cluster has to do with the gray level non-uniformity, which describes how non-uniform the tumour is in the CT image. Namely, the cluster associated with lower survivability tends to be composed of tumours that appear more non-uniform in the CT image.
At first, I thought the size would be the main factor, but in reality, size did not come in among the top 5 distinguishing features.
(5) Applications
Before using them in a clinical setting, there are certain points that need to be addressed:
(5.1) Are the features clinically stable? The features are extracted from patients at one point in time. But what if the patient was to be rescanned, say, after 15 minutes because the first set of CT images had some problems? Would the features change so much that it changed the outcome from the model?
(5.2) Does additional information change the prognosis? In this study, the histology of the tumour, the staging of the tumour, sex and age of the patient, the location of the tumour and the prescribed treatment were not considered. Surely that would have some bearing on the patient’s survival. For example, a lung tumour nodule that is located closer to the heart is more likely to result in higher toxicity from radiation therapy because of the inadvertent spill-over of the radiation dose.
The location of the tumour may be able to be taken into account by incorporating a convolutional neural network that takes in the whole CT image and not just the masked portion (i.e. the tumour). The problem, however, is that the length of the CT image set can differ from patient to patient due to the differing scan lengths.
Any questions or comments? Please let me know.