WEEK 3— Detecting Musculoskeletal Conditions

Hüseyincan Kaynak
bbm406f19
Published in
4 min readDec 16, 2019

Team Members: Utku İpek, Gokce Sengun, Hüseyincan Kaynak

Haven’t you read our previous blog yet? Week-2: https://medium.com/@gokceesengun/week-2-detecting-musculoskeletal-conditions-c7f732a909e8

Our next blog: Week-4: https://medium.com/bbm406f19/week-4-detecting-musculoskeletal-conditions-3077cc678c7e

Hello everyone again! did you miss us? We did miss you :) In this week, we will talk about related works. What is their solution? how good their solution is? No more waiting let’s get started.

MURA Dataset: Towards Radiologist-Level Abnormality Detection in Musculoskeletal Radiographs is one of the great related works about this condition we mostly get in to deep in this work. The following persons have a great contribution to this work: Pranav Rajpurkar, Jeremy Irvin, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, Andrew Ng.

Let’s have a look at their approach to the problem. But first, let us a little bit talk about Convolutional Neural Networks and Densely Connected Convolutional Networks.

Convolutional Neural Network

The convolutional neural network, or CNN for short, is a specialized type of neural network model designed for working with two-dimensional image data, although they can be used with one-dimensional and three-dimensional data. [1]

Densely Connected Convolutional Network

Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections — one between each layer and its subsequent layer — our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers.[2]

Related Work’s Model

The model takes as input one or more views for a study of an upper extremity. In each view, our 169-layer convolutional neural network predicts the probability of abnormality. They compute the overall probability of abnormality for the study by taking the arithmetic mean of the abnormality probabilities output by the network for each image. The model makes the binary prediction of abnormal if the probability of abnormality for the study is greater than 0.5. Figure 1 illustrates the model’s prediction pipeline.

Figure 1

They used a 169-layer convolutional neural network to predict the probability of abnormality for each image in a study. The network uses a Dense Convolutional Network architecture — detailed in Huang et al. (2016) — which connects each layer to every other layer in a feed-forward fashion to make the optimization of deep networks tractable. They replaced the final fully-connected layer with one that has a single output, after which They applied a sigmoid nonlinearity. For each image X of study type T in the training set, they optimized the weighted binary

where y is the label of the study, p(Y = i|X) is the probability that the network assigns to the label i, wT ,1 = |NT |/(|AT | + |NT |), and wT ,0 = |AT |/(|AT | +|NT |) where |AT | and |NT | are the number of abnormal images and normal images of study type T in the training set, respectively. Before feeding images into the network, they normalized each image to have the same mean and standard deviation of images in the ImageNet training set. They then scaled the variable-sized images to 224×224. They augmented the data during training by applying random lateral inversions and rotations. The weights of the network were initialized with weights from a model pre-trained on ImageNet (Deng et al., 2009). The network was trained end-to-end using Adam with default parameters β1 = 0.9 and β2 = 0.999 (Kingma & Ba, 2014). They trained the model using mini-batches of size 8. They used an initial learning rate of 0.0001 that is decayed by a factor of 10 each time the validation loss plateaus after an epoch and chose the model with the lowest validation loss.

This week we talked about related works and their solution. Next week we got a surprise we announce our solution and many more details. Keep in touch :)

--

--