WEEK 2 — Detecting Musculoskeletal Conditions
Team Members: Utku İpek, Hüseyincan Kaynak, Gokce Sengun
Our Previous Blog: Week-1 Detecting Musculoskeletal Conditions
Our Next Blog : Week-3 Detecting Musculoskeletal Conditions
Hello again, In the second week of our Machine Learning Term Project which is Detecting Musculoskeletal Conditions, we’re diving deep into the dataset.
MURA (musculoskeletal radiographs) is a large dataset of bone X-rays. Algorithms are tasked with determining whether an X-ray study is normal or abnormal. We will use MURA-v1.1 in our model.
How was the MURA dataset collected?
MURA is a dataset of musculoskeletal radiographs consisting of 14,863 studies from 12,173 patients, with a total of 40,561 multi-view radiographic images. Each belongs to one of seven standard upper extremity radiographic study types: elbow, finger, forearm, hand, humerus, shoulder, and wrist. Each study was manually labeled as normal or abnormal by board-certified radiologists from the Stanford Hospital at the time of clinical radiographic interpretation in the diagnostic radiology environment between 2001 and 2012.
Test Set Collection
To evaluate models and get a robust estimate of radiologist performance, Stanford ML Group collected additional labels from six board-certified Stanford radiologists on the test set, consisting of 207 musculoskeletal studies. The radiologists individually retrospectively reviewed and labeled each study in the test set as a DICOM file as normal or abnormal in the clinical reading room environment using the PACS system. The radiologists have 8.83 years of experience on average ranging from 2 to 25 years. They randomly chose 3 of these radiologists to create a gold standard, defined as the majority vote of labels of the radiologists. For details look at Figure 1.
When we look at the dataset we can easily get an image from train_image_paths.csv. It contains all the image paths. Also, there is train_labeled_stuidies.csv file that includes image paths and labels of that specific image. Each image is labeled as 1 (abnormal) or 0 (normal) based on whether its corresponding study is negative or positive, respectively.
Components of train
and valid
set
train
set consists of sevenstudy types
namely:XR_ELBOW
XR_FINGER
XR_FOREARM
XR_HAND
XR_HUMERUS
XR_SHOULDER
XR_WRIST
- Each
study type
contains several folders named like: patient12104
patient12110
patient12116
patient12122
patient12128
...- These folders are named after patient ids, each of these folders contains one or more
study
, named like: study1_negative
study2_negative
study3_positive
...- Each of these studies contains one or more radiographs (views or images), named like:
image1.png
image2.png
...- Each view (image) is RGB with pixel range [0, 255] and varies in dimensions.