Transforming Medical Image Analysis with Full Self-Supervision

Alice Dombos
KTH AI Society
Published in
4 min readFeb 22, 2024

Machine learning models have demonstrated remarkable success in interpreting medical images, frequently achieving equal or even superior performance to human experts [1–4]. This advancement holds immense potential for improving patient outcomes through more accurate and efficient classification of various pathologies. However, the available models rely on large labeled datasets for training which poses significant challenges, including high costs and time consumption. In a recent paper [5], Ekin Tiu et al. demonstrate the possibility of removing the labeling bottleneck by showcasing that a self-supervised machine learning model, trained on unannotated chest X-ray images, can carry out pathological classification tasks with an accuracy comparable to expert radiologists and supervised models.

Image Credit: Getty Images

Current machine learning models for medical image interpretation generally depend on large labeled datasets. While some models adopt a completely supervised approach, others are self-supervised and only require labeled data for fine-tuning. Nevertheless, the large-scale labeling efforts that are necessary require extensive domain knowledge and technical expertise which results in high costs and inefficiencies in machine learning workflows. Even though automatic labellers are occasionally utilized to extract explicit labels from unstructured text reports, the labeling process remains burdensome due to the need for expertise and the heigh development time.

The method, called CheXzero, that Tiu et al. present represents a transformative shift in medical image interpretation, embracing a fully self-supervised approach that eliminates the need for explicit manual or annotated labels. CheXzero leverages contrastive learning, a type of self-supervised learning, with image-text pairs to learn a representation that enables zero-shot multi-label classification of chest X-ray images. It utilizes unstructured data in the form of radiology reports, which serve as a natural source of supervision, to learn features of chest X-ray images. More precisely, the impressions section of the clinical reports was used in training since it consists of a brief summary of the entire report.

The image [5] illustrates (a) the training pipeline and (b) the process of predicting pathologies in chest X-ray images.

The self-supervised model was initialized using a Vision Transformer, ViT-B/32, and Transformer architectures with pre-trained weights from OpenAI’s CLIP model, capable of mapping text and images to a shared vector space. The MIMIC-CXR training set, a publicly available dataset of chest radiographs with radiology text reports, was used in the training process which narrowly followed the implementation of CLIP. Experiments were run by generating positive prompts (‘<label>’) and negative prompts (no ‘<label>’) that were based on the labels of the test set. These prompts were used during the softmax evaluation procedure, and by comparing them a probability score was calculated to determine the presence of a pathology.

Regarding accuracy, the self-supervised model performed remarkably well in comparison to expert radiologist and label-dependent models. The model was evaluated on the CheXpert and the PadChest dataset, both collected from different hospitals than the training dataset. There is no significant difference when comparing the statistical difference between the model’s and radiology experts’ average MCC and F1 score over the five CheXpert competition pathologies. Furthermore, the self-supervised model’s mean area under the curve (AUC) is only -0.042 points below the highest scoring fully supervised model on the CheXpert competition. The result also shows that the model outperforms previous label efficient methods (MoCo-CXR, MedAug and ConVIRT) on the CheXpert dataset. For instance, the AUC of the self-supervised model is 0.889 while that of MoCo-CXR trained on 10% of the labeled data is 0.850.

Besides the mentioned capabilities, the model obtains high accuracy across datasets with different distributions, overcoming a challenge faced by prior models. Its ability to generalize is speculated to stem from the unstructured text data which offers a broader range of radiographic information that can be applied to diverse datasets. Another contrast to models trained on labeled data, is its capability to identify pathologies that have not been explicitly annotated during training. However, the model can not detect pathologies unmentioned in the report. To address this a large number of samples encompassing diverse pathology descriptions were used in training. The training set also included a wide range of writing styles with the aim to reduce the limitations that occur when similar pathologies are described differently.

In conclusion, the CheXzero method demonstrates the potential of deep learning models to be trained on large amounts of unlabeled data to learn a broad range of medical image interpretation tasks. The prospect of training models on extensive unlabelled datasets opens avenues for increased efficiency and effectiveness, marking a promising step forward in enhancing the capabilities of machine learning applications in healthcare.

Author

Alice Dombos is a member of the KTH AI Society, and a student in Computer Science and Engineering at the KTH Royal Institute of Technology. You can reach her on LinkedIn or by email at alice@kthais.com.

References

[1] Rajpurkar, P., et al. (2017). CheXNet: radiologist-level pneumonia detection on chest X-Rays with deep learning. arXiv. https://doi.org/10.48550/arXiv.1711.05225

[2] Litjens, G., et al. (2017). A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88. https://doi.org/10.1016/j.media.2017.07.005

[3] Esteva, A., et al. (2021). Deep learning-enabled medical computer vision. npj Digit. Med. 4, 5. https://doi.org/10.1038/s41746-020-00376-2

[4] Shen, D., et al. (2017). Deep learning in medical image analysis. Annu. Rev. Biomed. Eng. 19, 221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442

[5] Tiu, E., et al. (2022). Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng 6, 1399–1406. https://doi.org/10.1038/s41551-022-00936-9

--

--