This article is part of the “Deep Learning in Practice” series.
Read the part 2: “(2/2) Fastai, the new radiology tool”.
MURA is a dataset of bone X-rays that allows to create models that find abnormalities. Fastai v1 allows to create such a world-class model as part of the MURA competition, which evaluates the performance of a study classifier using the kappa score.
- Addition of a nbviewer of the notebook (the jupyter notebook is too big to be well displayed in github)
- Addition of a paragraph explaining what is the Cohen’s kappa statistic (fastai v1 has already implemented it as a metric)
- Link to a part 2 of this post where more information about how the model performed is given.
What is MURA?
(source) MURA (MUsculoskeletal RAdiographs) is a large dataset of bone X-rays that allows to create models that determines whether an X-ray study is normal or abnormal (we could use as well this dataset to classify bones into the categories shoulder, humerus, elbow, forearm, wrist, hand, and finger). MURA is one of the largest public radiographic image datasets.
Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. The Stanford ML Group hopes that their dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.
This dataset is available to the community and the Stanford ML Group is holding a competition to determine if the models created can work as well as the radiologists on the task (note: read the MURA Submission Tutorial to know the process of submitting your results for official evaluation).
The objective of the MURA competition is to classify every study into normal or abnormal (binary predictions), not every image.
The best Radiologist Performance Stanford University is 0.778 (kappa score).
Cohen’s kappa statistic
The metric used by the MURA competition is not the classical accuracy but the kappa score or Cohen’s kappa statistic. This is a more robust measure than simple accuracy, as it takes into account the possibility of the agreement occurring by chance by subtracting it from the observed agreement.
More explanation: read Cohen’s kappa in plain English and watch the following video.
Standard Fastai v1 way on the MURA dataset
(source: paper, May 2018) The MURA dataset contains 40,561 images from 14,863 studies. Each study contains one or more views (images) and is manually labeled by radiologists as either normal or abnormal.
Theses images are divided into 36808 training images (within studies) and 3197 validation ones (within studies).
We used 2 pretrained models: a simple one (resnet34) and a much deeper one (densenet169, the one used by the paper writers) in order to demonstrate what can bring a deeper pretrained network in the health domain for classifying bone X-rays.
For each model, we used the standard fastai v1 way of training a classification Deep Learning model:
- use of a pretrained model,
- creation of an ImageDataBunch by the use of the function from_folder(),
- databunch image size divided by 2 (112) and after multiplied by 2 (224),
- training of the last added layers and then, training of the whole model after unfreezing,
- use of the function lr_find() to get the best learning rate,
- use of the function fit_one_cycle() that allows to optimize the training by adapting the value of the learning rate for each model weight,
- analysis of the results (predictions on validation set) with the functions ClassificationInterpretation.from_learner(), interp.top_losses(), interp.plot_confusion_matrix(),interp.most_confused() and interp.plot_top_losses()
In terms of metric, fastai v1 has kappa already implemented.
The overall accuracy of our model (densenet169) is 0.829 with a kappa of 0.642 and it would allow us to get the 56th place of the MURA competition (see screenshot below of the MURA Competition leaderboard).
This place (there are 10 models with a kappa lower) means that the Fastai v1 (and the jupyter notebooks of its course) allows to quickly create world-class models of Deep Learning in images classification in the field of health, and in particular in radiology. Indeed, we only used the standard fastai v1 way of training a classification Deep Learning model.
This is excellent news because it means that non-radiologists — but fastai specialists — can help radiologists better diagnose (and faster) diseases present in X-rays.