WEEK 2 — Detecting Musculoskeletal Conditions

Gokce Sengun
bbm406f19
Published in
3 min readDec 10, 2019

Team Members: Utku İpek, Hüseyincan Kaynak, Gokce Sengun

Our Previous Blog: Week-1 Detecting Musculoskeletal Conditions

Our Next Blog : Week-3 Detecting Musculoskeletal Conditions

Hello again, In the second week of our Machine Learning Term Project which is Detecting Musculoskeletal Conditions, we’re diving deep into the dataset.

MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal. We will use MURA-v1.1 in our model.

via GIPHY

How was the MURA dataset collected?

MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. Each study was manually labeled as normal or abnormal by board-certified radiologists from the Stanford Hospital at the time of clinical radiographic interpretation in the diagnostic radiology environment between 2001 and 2012.

from Stanford ML Group

Test Set Collection

To evaluate models and get a robust estimate of radiologist performance, Stanford ML Group collected additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. The radiologists individually retrospectively reviewed and labeled each study in the test set as a DICOM file as normal or abnormal in the clinical reading room environment using the PACS system. The radiologists have 8.83 years of experience on average ranging from 2 to 25 years. They randomly chose 3 of these radiologists to create a gold standard, defined as the majority vote of labels of the radiologists. For details look at Figure 1.

Figure 1: Predictions of the Radiologists

When we look at the dataset we can easily get an image from train_image_paths.csv. It contains all the image paths. Also, there is train_labeled_stuidies.csv file that includes image paths and labels of that specific image. Each image is labeled as 1 (abnormal) or 0 (normal) based on whether its corresponding study is negative or positive, respectively.

Components of train and valid set

  • train set consists of seven study types namely:
  • XR_ELBOW XR_FINGER XR_FOREARM XR_HAND XR_HUMERUS XR_SHOULDER XR_WRIST
  • Each study type contains several folders named like:
  • patient12104 patient12110 patient12116 patient12122 patient12128 ...
  • These folders are named after patient ids, each of these folders contains one or more study, named like:
  • study1_negative study2_negative study3_positive ...
  • Each of these studies contains one or more radiographs (views or images), named like:
  • image1.png image2.png ...
  • Each view (image) is RGB with pixel range [0, 255] and varies in dimensions.

We look forward to seeing you next week!

--

--