(2/2) Fastai, the new radiology tool

Pierre Guillou
Mar 25 · 5 min read
Musculoskeletal Disorder (image credit)

This article is part of the “Deep Learning in Practice” series.

Read the part 1: “(1/2) Fastai, the new radiology tool”.


MURA is a dataset of bone X-rays that allows to create models that find abnormalities. Fastai v1 allows to create such a world-class model as part of the MURA competition, which evaluates the performance of a study classifier using the kappa score.

In part 1, we applied the standard fastai way of training a classification Deep Learning model on two pre-trained ones (resnet34 and densenet169), which allowed us to obtain a kappa score of 0.642. In this part 2, we sought to optimize the training of the densenet169 model and we managed to achieve a better kappa score of 0.674.

Based on our work and our results, we highlight 3 points regarding the use of Deep Learning with medical images: need for a fast GPU, use of ensemble models and necessary medical skills to create a professional medical application.

[+] Code in jupyter notebook [+] nbviewer of the notebook

Network architecture and Training

Training methodology of our pre-trained model densenet169 and main results (accuracy and kappa score)

In this part 2, we have kept the densenet169 model used in the MURA paper but we have trained it with the following parameters and process (when a specific fastai technique was used, it is written “(fastai)”):

Excerpts from the MURA paper about the network architecture and training:

We used a 169-layer convolutional neural network to predict the probability of abnormality for eachimage in a study. The network uses a Dense Convolutional Network architecture — detailed in Huanget al. (2016) — which connects each layer to every other layer in a feed-forward fashion to make theoptimization of deep networks tractable. We replaced the final fully connected layer with one that has a single output, after which we applied a sigmoid nonlinearity.

Before feeding images into the network, we normalized each image to have the same mean and standard deviation of images in the ImageNet training set. We then scaled the variable-sized images to 320×320. We augmented the data during training by applying random lateral inversions and rotations of up to 30 degrees.The weights of the network were initialized with weights from a model pretrained on ImageNet (Deng et al., 2009). The network was trained end-to-end using Adam with default parameters β1=0.9 and β2=0.999 (Kingma & Ba, 2014). We trained the model using minibatches of size 8. We used an initial learning rate of 0.0001 that is decayed by a factor of 10 each time the validation loss plateaus after an epoch. We ensembled the 5 models with the lowest validation losses.

Musculoskeletal radiographic studies

Table of the musculoskeletal radiographic studies in the MURA dataset (source)

Neither the training set, nor the validation one are balanced but the ratio (about 60/40 for the training set, 80/20 for the validation one) with the use of Data Augmentation techniques (rotation of 30 degrees, horizontal flip, zoom of 1.1, change of brightness/contrast) when training the model should not be very troublesome.

Note: in fact, we tested a weighted loss function but it did not bring improvements.


In the following table and chart, we can see that our model with 320x320 image size gets a better kappa score than the paper model in 3 categories (elbow, finger and humerus) and in one category compared to the average radiologists kappa score (finger).

We performed reasonably well (47 out of 67 participants) but we did not achieve the overall kappa score of the paper model (0.705) despite many trials from the same pre-trained model (densenet169). It would be interesting if the authors of the paper could publish their code or other users of fastai take back our notebook to improve it.

Comparative classification performance of normal and abnormal studies by body part and by kappa score between radiologists, paper model and our 2 models (in green, kappa scores higher than the corresponding ones of the paper model)
Chart of the comparative classification performance of normal and abnormal studies by body part and kappa score between the radiologists kappa average, paper model and our model with image size 320x320
The overall kappa score of our model (densenet169) is 0.674, which means the 47th place of 67 participants in the MURA competition

More about the results by model

Step 1 | Input images size of 112x112

(image size: 112x112) Confusion matrix on MURA validation set
(image size: 112x112) The 9 worst predictions of our model (ie, with the highest error rates). All images show an abnormality but our model is sure they are normal with a confidence level of almost 100%.

Step 2| Input images size of 320x320

(image size: 320x320) Confusion matrix on MURA validation set
(image size: 320x320) The 9 worst predictions of our model (ie, with the highest error rates). All images show an abnormality but our model is sure they are normal with a confidence level of almost 100%.


Our experience with the MURA dataset brings us to highlight the following points:

Pierre Guillou

Written by

AI, Machine Learning, Deep learning | Fastai | Brasília, Paris