Pathology meets Large Scale Self-supervised Learning

Mingu Kang
Lunit Team Blog
Published in
7 min readJun 30, 2023

Introduction

At Lunit Oncology, we focus on leveraging deep learning and computer vision technologies to provide advanced diagnostic tools and insights to healthcare professionals, with the ultimate goal of improving patient outcomes in cancer care. We have developed AI algorithms that can assist pathologists in detecting cells and segmenting cancerous tissues more accurately and efficiently. High quality data is necessary for better AI models, but annotated data are scarce in pathology. In this post, we will introduce how we overcome this limitation. This work was published recently at CVPR 2023, a top conference in AI and Computer Vision.

Data is the lifeblood of AI

The power of high quality annotated data in enhancing model performance cannot be overstated. We have consistently observed that as the quality of data improves, so does the performance. For sure, an addition of high quality data cannot hurt. However, an endless addition of data is not feasible nor sustainable. Specifically, we have encountered two major hurdles in constructing high quality annotated datasets.

  1. Pathology images must be annotated by board-certified pathologists, making it costly.
  2. So-called “inter-observer variability” makes it challenging to acquire consistent annotations.

Expert-driven labor is essential for acquiring pathology image annotation. Even when we manage to secure such costly resources, inter-observer variability can arise among different experts which may produce noisy annotations. At Lunit, we strive to overcome these issues by (a) selecting only the potentially informative regions for annotation, and (b) integrating AI-assisted tools into the annotation process to improve annotation quality and reduce cost.

Even with the efficient annotation process that we have established with the efforts aforementioned, it still falls short of fully utilizing the undisclosed capacity of Whole Slide Images (WSIs). With the usual annotation process, we only explore the tip of the iceberg. We firmly believe that these unexplored regions also hold valuable information for advancing AI models. As a result, we embarked on a quest to uncover ways to utilize the potential of the unexplored regions and maximize their contribution to our models.

Self-supervised Learning in CPATH

In the medical domain, annotated data is often scarce because of the aforementioned hurdles. To make up for a lack of supervision, we often resort to using ImageNet pre-trained weights as a workaround, even though there are significant differences between natural images and pathology images. Such a paradigm of learning has been shifted with the introduction of Self-Supervised Learning (SSL).

SSL provides a powerful framework for extracting meaningful representations and understanding complex patterns from large-scale unlabeled data without the need for expert-driven supervision. This breakthrough opens up new possibilities in pathology, as we can now fully utilize the context of WSIs without relying solely on pathologists.

The benefits of SSL in Computational Pathology (CPATH) are twofold. First, it helps overcome the scarcity of annotated data by utilizing large amounts of unlabeled data, which is more abundant and easier to obtain. Second, the pre-trained features capture domain-relevant characteristics that can aid in downstream tasks, such as tissue segmentation or cell detection. We will showcase these benefits through extensive large-scale experiments, providing compelling evidence of the advantages SSL brings to the field of pathology.

Opening Pandora’s Box of SSL in Pathology

The recent works in Computer Vision (CV) show that SSL pre-training with a bunch of unlabeled data can improve performance on several CV downstream tasks. With such observation, one question that naturally arises from us is,

How well does self-supervised learning help in improving the performance of pathology tasks?

We attempt to demystify and confirm the effectiveness of SSL in pathology via large-scale experiments to answer this question.

Datasets

In order to demonstrate the potential of SSL, we have to construct large-scale unlabeled data. For that, we collect 20,994 WSIs from The Cancer Genome Atlas (TCGA) which is a public data source. Also, we add 15,672 WSIs from our internal WSIs. All WSIs are H&E stained. We extract at most 500 (512 x 512) patches from each slide, resulting in a total of 32.6M patches. The pre-training covers two different FoVs; 20× (0.5𝜇m/px) and 40× (0.25𝜇m/px) objective magnification.

Specification of unlabeled data for pre-training. Note that TCGA is a public data source, while TULIP is an internal data source.

For downstream task evaluations, we select 5 different public datasets which are widely adopted in the field of CPATH. We validate the pre-trained models under these datasets. The detail of each downstream dataset can be found in Section 4.2 of our paper.

Specification of downstream datasets. ‘Cls’ represents image classification and ‘Seg’ is Nuclei Instance Segmentation.

Techniques to Adapt SSL for Pathology

Typically, SSL generates learning signals from two augmented views of a given image. The literature has reported that the choice of augmentations plays a crucial role in building meaningful representations. Given that SSL methods have primarily been studied with natural images, applying the proposed augmentations without adaptation may not be optimal for pathology images.

A set of augmentation for SSL has been studied under natural images, such as ImageNet.

While the existing methods perform reasonably well, we aim to push the performance boundaries by integrating domain-specific knowledge. We carefully consider the unique characteristics of pathology images and design pathology-specific augmentations accordingly. For this purpose, we identify 3 major differences between natural and pathology images and propose techniques that can effectively utilize such differences.

Pathology images vs Natural images. Pathology images are different from natural images in 3 major ways.
  • No canonical orientation: Objects or scenes contained in natural images are oriented based on plausibility. Unlike natural images where objects or scenes have a plausible orientation, pathology images can be oriented in any direction while remaining valid. Taking advantage of this characteristic, we incorporate vertical flip as an augmentation technique.
  • Low color variation: While natural images contain a large range of colors due to the diversity of represented objects, pathology images tend to display similar color distributions (e.g. purple and pink staining). By considering this aspect, we apply a pathology-specific stain augmentation and weaken the color jittering augmentation typically used in SSL methods.
  • Different FoVs: To correctly analyze pathology images, different Field of Views (FoVs) must be considered. A larger FoV allows pathologists and algorithms to better understand the larger context of the tissue regions and cell classes to make high-level predictions. Therefore, we construct our large-scale unlabeled dataset using image patches from multiple magnifications (e.g., 20× and 40×), thus encompassing various FoVs.

With these techniques, we consistently observe improvements in performance across all datasets. This confirms the significance of incorporating domain-specific knowledge in SSL and its ability to contribute to enhanced representation. These findings highlight the potential of leveraging pathology-specific insights in the pursuit of more effective SSL.

Benefit of our augmentation techniques. The first row represents linear evaluation performance on 5 downstream tasks when using the default set of augmentations proposed by Barlow Twins (BT) paper. When we add pathology-specific augmentations denoted in `our aug. tech`, the performance improves across all datasets.

Benchmark SSL on Diverse Pathology Datasets

We conduct extensive experiments to assess the potential of large-scale SSL in the field of pathology. In contrast to previous studies that focus on a single SSL method, such as SimCLR, we comprehensively cover four different SSL learning paradigms: Contrastive, Non-contrastive, Clustering, and VisionTransformer. From each paradigm, we select a representative method for evaluation. Moreover, we showcase the performance across multiple architectures, including ResNet and ViT, as well as multiple downstream tasks, such as image classification and nuclei instance segmentation. We carry out various experiments using both Linear and Fine-tune Evaluation protocols.

The results of our experiments are presented in the following graphs. As evident, regardless of the specific SSL methods employed, SSL consistently improves performance across all datasets compared to using ImageNet supervised weights. This indicates the clear benefits of domain-aligned pre-training for pathology-specific downstream tasks.

Domain-aligned pre-training improves downstream performance. The 𝑦-axes show absolute differences in downstream task performance over ImageNet pre-trained weights (𝑦=0).
Downstream evaluation of the image classification and nuclei instance segmentation tasks. We report Top-1 accuracy in image classification and mPQ in nuclei instance segmentation for both linear and fine-tuning experiments, respectively. We compare against the ImageNet-supervised (denoted Supervised) pre-training baseline of the corresponding backbone type as well as a random initialization (denoted Random) baseline.

To further emphasize the potential of SSL, we perform the label-efficiency study. For this study, we constrain the annotated data to 10% and 30% of the available dataset and demonstrate the performance in such low-data regimes. This experiment holds more practical values.

As shown in the below figure, the results clearly demonstrate that utilizing domain-aligned pre-trained weights is highly effective in achieving label efficiency. Notably, we observed a significant performance gap, particularly with the ViT architecture, emphasizing the advantages of SSL in such scenarios.

Fine-tune evaluation results in a label-efficiency study. We used CoNSeP dataset introduced from Hover-Net.

Final Remark

We conduct the largest and most comprehensive study of SSL in the pathology domain, unleashing the potential of SSL from various perspectives. Our thorough investigation could produce invaluable insights into the field. Additionally, we are releasing pre-trained weights in many of our experiments, to advance the field and foster collaborations. We firmly believe SSL is one of the viable solutions to not only overcome data scarcity but also push the performance beyond ImageNet pre-trained weights. It could help us to go one step further to conquer cancer through AI. Please, check out the detail in our paper!

References

[1] Benchmarking Self-Supervised Learning on Diverse Pathology Datasets: https://openaccess.thecvf.com/content/CVPR2023/papers/Kang_Benchmarking_Self-Supervised_Learning_on_Diverse_Pathology_Datasets_CVPR_2023_paper.pdf

[2] Benchmarking Self-Supervised Learning on Diverse Pathology Datasets (Supplementary Materials): https://openaccess.thecvf.com/content/CVPR2023/supplemental/Kang_Benchmarking_Self-Supervised_Learning_CVPR_2023_supplemental.pdf

--

--