WEEK 2 — Detecting Musculoskeletal Conditions

Gokce Sengun

Published in

bbm406f19

3 min readDec 10, 2019

Team Members: Utku İpek, Hüseyincan Kaynak, Gokce Sengun

Our Previous Blog: Week-1 Detecting Musculoskeletal Conditions

Our Next Blog : Week-3 Detecting Musculoskeletal Conditions

Hello again, In the second week of our Machine Learning Term Project which is Detecting Musculoskeletal Conditions, we’re diving deep into the dataset.

MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal. We will use MURA-v1.1 in our model.

via GIPHY

How was the MURA dataset collected?

MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. Each study was manually labeled as normal or abnormal by board-certified radiologists from the Stanford Hospital at the time of clinical radiographic interpretation in the diagnostic radiology environment between 2001 and 2012.

Test Set Collection

To evaluate models and get a robust estimate of radiologist performance, Stanford ML Group collected additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. The radiologists individually retrospectively reviewed and labeled each study in the test set as a DICOM file as normal or abnormal in the clinical reading room environment using the PACS system. The radiologists have 8.83 years of experience on average ranging from 2 to 25 years. They randomly chose 3 of these radiologists to create a gold standard, defined as the majority vote of labels of the radiologists. For details look at Figure 1.

Figure 1: Predictions of the Radiologists

When we look at the dataset we can easily get an image from train_image_paths.csv. It contains all the image paths. Also, there is train_labeled_stuidies.csv file that includes image paths and labels of that specific image. Each image is labeled as 1 (abnormal) or 0 (normal) based on whether its corresponding study is negative or positive, respectively.

Components of `train` and `valid` set

train set consists of seven study types namely:
XR_ELBOW XR_FINGER XR_FOREARM XR_HAND XR_HUMERUS XR_SHOULDER XR_WRIST
Each study type contains several folders named like:
patient12104 patient12110 patient12116 patient12122 patient12128 ...
These folders are named after patient ids, each of these folders contains one or more study, named like:
study1_negative study2_negative study3_positive ...
Each of these studies contains one or more radiographs (views or images), named like:
image1.png image2.png ...
Each view (image) is RGB with pixel range [0, 255] and varies in dimensions.

We look forward to seeing you next week!

WEEK 2 — Detecting Musculoskeletal Conditions

How was the MURA dataset collected?

Test Set Collection

Components of train and valid set

Written by Gokce Sengun

Components of `train` and `valid` set