(1/2) Fastai, the new radiology tool

Musculoskeletal Disorder in the Workplace (image credit)

This article is part of the “Deep Learning in Practice” series.

Read the part 2: “(2/2) Fastai, the new radiology tool”.

Abstract

MURA is a dataset of bone X-rays that allows to create models that find abnormalities. Fastai v1 allows to create such a world-class model as part of the MURA competition, which evaluates the performance of a study classifier using the kappa score.

[+] Code in jupyter notebook [+] nbviewer of the notebook

Edit (20/03/2019)

The following changes have been done (thanks Jeremy Howard for your feedback tweet :-)

What is MURA?

(source) MURA (MUsculoskeletal RAdiographs) is a large dataset of bone X-rays that allows to create models that determines whether an X-ray study is normal or abnormal (we could use as well this dataset to classify bones into the categories shoulder, humerus, elbow, forearm, wrist, hand, and finger). MURA is one of the largest public radiographic image datasets.

Source: MURA: Large Dataset for Abnormality Detection in Musculoskeletal Radiographs (May 2018)

Musculoskeletal conditions

Musculoskeletal conditions affect more than 1.7 billion people worldwide, and are the most common cause of severe, long-term pain and disability, with 30 million emergency department visits annually and increasing. The Stanford ML Group hopes that their dataset can lead to significant advances in medical imaging technologies which can diagnose at the level of experts, towards improving healthcare access in parts of the world where access to skilled radiologists is limited.

MURA competition

This dataset is available to the community and the Stanford ML Group is holding a competition to determine if the models created can work as well as the radiologists on the task (note: read the MURA Submission Tutorial to know the process of submitting your results for official evaluation).

The objective of the MURA competition is to classify every study into normal or abnormal (binary predictions), not every image.

The best Radiologist Performance Stanford University is 0.778 (kappa score).

Cohen’s kappa statistic

The metric used by the MURA competition is not the classical accuracy but the kappa score or Cohen’s kappa statistic. This is a more robust measure than simple accuracy, as it takes into account the possibility of the agreement occurring by chance by subtracting it from the observed agreement.

More explanation: read Cohen’s kappa in plain English and watch the following video.

Video about Kappa Coefficient

Standard Fastai v1 way on the MURA dataset

(source: paper, May 2018) The MURA dataset contains 40,561 images from 14,863 studies. Each study contains one or more views (images) and is manually labeled by radiologists as either normal or abnormal.

Theses images are divided into 36808 training images (within studies) and 3197 validation ones (within studies).

We used fastai v1 on the MURA dataset in the context of the MURA competition (see our jupyter notebook).

We used 2 pretrained models: a simple one (resnet34) and a much deeper one (densenet169, the one used by the paper writers) in order to demonstrate what can bring a deeper pretrained network in the health domain for classifying bone X-rays.

For each model, we used the standard fastai v1 way of training a classification Deep Learning model:

  • use of a pretrained model,
  • creation of an ImageDataBunch by the use of the function from_folder(),
  • databunch image size divided by 2 (112) and after multiplied by 2 (224),
  • training of the last added layers and then, training of the whole model after unfreezing,
  • use of the function lr_find() to get the best learning rate,
  • use of the function fit_one_cycle() that allows to optimize the training by adapting the value of the learning rate for each model weight,
  • analysis of the results (predictions on validation set) with the functions ClassificationInterpretation.from_learner(), interp.top_losses(), interp.plot_confusion_matrix(),interp.most_confused() and interp.plot_top_losses()

In terms of metric, fastai v1 has kappa already implemented.

Results

The overall accuracy of our model (densenet169) is 0.829 with a kappa of 0.642 and it would allow us to get the 56th place of the MURA competition (see screenshot below of the MURA Competition leaderboard).

The overall accuracy of our model (densenet169) is 0.829 with a kappa of 0.642 and it would allow us to get the 56th place of the MURA competition.

This place (there are 10 models with a kappa lower) means that the Fastai v1 (and the jupyter notebooks of its course) allows to quickly create world-class models of Deep Learning in images classification in the field of health, and in particular in radiology. Indeed, we only used the standard fastai v1 way of training a classification Deep Learning model.

This is excellent news because it means that non-radiologists — but fastai specialists — can help radiologists better diagnose (and faster) diseases present in X-rays.