My original work with the DDSM and CBIS-DDSM dataset yielded good accuracy and recall on the test and validation data, but the model didn’t perform so well when applied to the MIAS images, which came from a completely different dataset. Additional analysis of the images indicated that the negative images (from the DDSM) and the positive images (from the CBIS-DDSM) were different in some subtle but important ways:
- The negative images had lower contrast.
- The negative images had a lower mean, lower maximum and higher minimum pixel value.
We had become concerned about point 1 when we discovered that increasing the contrast of any image made it more likely to be predicted as positive regardless of its actual label. and discovered point 2 while investigating this further. When applying our fully convolutional model trained on the combined data to complete scans, rather than the 299x299 images we had trained on, we noticed that most sections of a positive image were predicted as positive, even if the ROI was, in fact, only present in one section. This indicates that the model was using some feature of the images other than the ROI in its prediction.
When starting this project, we had initially planned to segment the CBIS-DDSM images and use images which did not contain an ROI as negative images, but we were not certain that there were not differences in the tissue of positive and negative scans which might make this approach not generalize to completely negative scans. When we realized that the scans had been pre-processed differently we attempted to adjust the negative images in such a way as to make them more similar to the positive images but were unable to do so without knowledge of how they had been processed.
Our solution to both of these issues was to train the model to do the segmentation of the scans rather than simple classification, using the masks as the labels. This approach had several advantages:
- Using the mask as the label tells the model where it needs to look, so we can ensure that it actually uses the ROIs rather than other features of the images, such as the contrast or maximum pixel value.
- This allows us the exclude the DDSM images and only use images from one dataset, as the ROI of most scans only encompasses a small portion of the image.
We recreated the model to do semantic segmentation by removing the last “fully connected” layer (which were implemented as a 1x1 convolution) and the logits layer and upsampling the results with transpose convolutions. In order for the upsampling to work properly we needed to have the size of the images be a multiple of 2 so that the dimension reduction could be properly undone, so we used images of size 320x320.
We were able to get fairly good results training on this data with a pixel level accuracy of about 90% and a pixel recall of 70%. Using the presence of any positive pixels as a predictor of the class of the entire image gave us accuracy and recall of 70% and 87%, respectively. However, we noticed certain patterns of incorrect predictions.
Images which contained patches that were much brighter than the rest of the image tended to have the bright patches predicted positive regardless of the actual label. This pattern was mostly observed when the bright patch was on the edge of the image, as seen in figure 2.
We know that the context of an ROI is important in detecting and diagnosing it, and we suspected that in the absence of context the model was predicting any patch substantially brighter than it’s surroundings to be positive. While for cancer detection, it is better to make a false positive than a false negative we thought that this pattern might become problematic when applying the model to images larger than those it was trained on. To address this issue we decided to create a dataset of larger images and continue training our model on those.
We created a dataset of 640x640 images and adjusted our existing model to take those as input. One example image and label is seen in figure 3. As the model is fully convolutional we can reuse the weights from the model trained on 320x320 images without any problems. Since training a model on such large images is a very slow process, having the weights pre-trained on smaller images allows us to greatly speed up the training process.
The process of training on the 640x640 images is currently in progress. If the results are promising we may then increase the size of the training images and continue fine-tuning the model iteratively until we have a model which performs well on full-sized scans.