Medical Imaging with Deep Learning (MIDL 2018) Conference: Exploring Rejected Extended Abstracts

Ilknur Icke
9 min readAug 21, 2018

Nowadays, everybody and their grandmothers (and grandfathers) are utilizing deep learning for medical imaging problems. This has now become a hot industry. Even though major computer vision (CVPR etc.) and medical imaging conferences (SPIE,ISBI,MICCAI) are currently dominated by deep learning methods, a conference specific to deep learning applied to medical imaging has just emerged in 2018.

We are lucky to have free access to videos and slides from latest research here. As it is becoming a common practice, reviews for accepted as well as rejected papers are online on openreview.net.

I quickly looked into what was rejected. There are certainly interesting things to learn from research that did not quite make it in this round. This post covers the extended abstracts track. There is also a full conference track with lots of rejected papers to look into..but that is for another rainy day…

Judging by the reviews (that were apparently based on 2–3 page abstracts) shared openly, the selection process was competitive. According to openreview.net site(*), 35 papers were accepted and 44 papers were rejected (around 44% acceptance rate):
https://openreview.net/group?id=MIDL.amsterdam/2018/Abstract

Couple things stood out for me:

There were some controversial papers that received mixed reviews (some accept, some reject) but eventually rejected:

  • Deep learning & Atlas based segmentation hybrid:This is one area of interest for me. As we all know, deep learning models are dumb and have no clue about what they are looking at. You can train the same model on cat images or cardiac images. It is a cool idea to try and introduce anatomical knowledge into these models. One reviewer clearly accepts the paper even though the validation was not done quantitatively, while the second reviewer is not impressed because “UNet and multi-atlas registration based segmentation is widely used”.
  • Application of geometric deep learning to brain imaging. Spectral Analysis Towards Geometric Auto-Encoding of Subcortical Structures is about statistical shape analysis. As one reviewer puts it, this is a novel technique to apply deep learning to meshes instead of images directly, however, the details were not found to be clear enough to judge. It is based on multi-scale mesh based shape representation. The algorithm seems to be learning a layered representation of these shapes in terms of a hierarchy of shape signatures (based on spectral wavelet transforms). Learning is done layer-wise using unsupervised learning such as k-means or variational Bayes EM. In the past, I was obsessed with content based 3D shape retrieval. The notion of shape signatures was also used there in conjunction with similarity measures. This takes me back to those times.
  • Predicting follow up images to track disease progression . The goal is to develop an unsupervised model to learn static anatomical structures and the dynamic changes of the morphology due to aging or disease progression. The paper Unsupervised Representation Learning of Dynamic Retinal Image Changes by Predicting the Follow-up Image uses almost “4000 OCT images of about 200 patients with macular degeneration who were scanned over 24 months”. One reviewer found the experiments confusing.
  • Cycle-Consistent Generative Adversarial Networks for Image Segmentation. This seems to be an interesting idea to perform Epithelial tissue image segmentation using GANs. According to the authors: “The model consists of two generators, one that maps from the image to the segmentation domain and a second that maps from the segmentation to the image domain, and two associated adversarial discriminators. A so-called cycle-consistency loss regularizes the mapping and enforces a relationship between an image in the segmentation and the image domain”. They find the performance of Cycle-GAN comparable to U-NET that had to be trained on a paired input image-segmentation dataset. As the reviewers point out they only run experiments with the fully annotated training set, therefore did not sufficiently show that it works when we lack annotations. The second paper using the same general idea is on liver lesion segmentation. It proposes an improved U-Net architecture called the polyphase U-Net as a generator in the Cycle-GAN. However, the reviewers pointed out that the performance did not match up to the state-of-the-art in the competition where the dataset came from.
  • Deep Learning-Based 3D Freehand Ultrasound Reconstruction. Even though ultrasound imaging is cheap and safe, it is challenging to construct 3D volumes without external probes or tracking hardware. According to one reviewer, the authors have previously presented a deep learning based method to build 3D volumes from a series of 2D images at MICCAI. In their MIDL submission they have added an inertial measurement unit (IMU) hardware to model more realistic operator hand motions via a gyroscope. It must indeed be challenging to do this solely based on images after all. However, it seems this was no longer novel enough for the reviewers.
  • Reconstruction of sparsely sampled Magnetic Resonance Imaging measurements with a convolutional neural network. This one talks about Compressed Sensing accelerated Magnetic Resonance Imaging (MRI) and how a neural network can be used to decode accelerated, undersampled MR acquisitions, eliminating the need for reconstruction algorithms. One reviewer likes it while the other bashes it questioning methodological novelty and asking for comparisons with other architectures.

There were some interesting ideas (or innovations one might say) borrowed from previous literature but did not quite make it in the end:

  • Improved Deep Learning Model for Bone Age Assessment using Triplet Ranking Loss Strangely this paper received marginally above threshold ratings from both reviewers but ended up being rejected in the end. It uses the idea of regularizing feature embeddings with respect to clustering of similar cases together in the feature space by introducing ranking in the loss function. I guess this triplet ranking loss concept was proposed by Google in 2015 for face recognition problems. To my understanding, the reviewers were saying that the experiments on the 3-stage architecture were not clearly explained or had some issues, so they were not convinced.
  • Curriculum learning from patch to entire image for screening pulmonary abnormal patterns in chest-PA X-ray This paper borrows the notion of curriculum learning from Bengio et al.’s 2009 ICML paper that proposes training gradually from simple to more complex concepts. In this paper the authors first train a RESNET 50 using Imagenet dataset, then train it using lesion extracted patches. Then they fine-tune it using the entire images. Their idea is that it would be difficult to train the network on the more complex whole images where there are other organs etc. exist. Interesting idea. However, the reviewers had issues with clarity of the writing and lack of some details.
  • Detection of Gastric Cancer from Histopathological Image using Deep Learning with Weak Label Weak supervision is a work-around to handle lack of high quality annotated datasets from domain experts where large scale albeit noisy and lower quality annotations are gathered using cheap annotators or programmatically. Apparently the idea is not new in machine learning but has been gaining popularity in deep learning recently. The goal is to predict slide level labels, however since the images are large, it uses a patch based approach where the algorithm ultimately generates a probability map by stitching patch level predictions. Then a random forest algorithm predicts the label for the whole slide using the probability maps.
    This paper uses slide-level weak labels if there are no region-level labels available for the patches. It was not really found novel by the reviewers I am guessing.

This conclusion is not surprising at all, but here it goes: based on common reviewer comments, it seems that having a large dataset is preferable and that is becoming less of an issue with lots of free datasets being available via competitions etc. One rule of thumb however, is to make sure that the method is at least as good as the state-of-the-art if one is using a competition/challenge dataset. Even if the dataset/application domain is novel, some innovation on the architecture along with comparison to established architectures is expected. Of course, in all cases, it is essential to be able to clearly convey all relevant information within the limited number of pages (2 or 3).

*Note: authors are allowed to remove their papers from the site, therefore these numbers might not be accurate.

Disclaimer: the opinions stated in this blog post are my own. They do not represent my employer’s views or opinions on related topics.

--

--