Recent Results on Investigating Instance Relationships with the Shuffle Instance Strategy

8 min readDec 10, 2023

In the latest blog, we introduced our SIS, then, we dissscuss its results.

I am currently a postdoctoral at ETH Zürich, Switzerland. I am co-supervised by Prof. Radu Timofte. I mainly focus on medical image research, especially on pathological image analysis.

If you are interested in this work, feel free to contact us to collaborate on research together.

Results

Our findings highlight the critical importance of instances in pathological images, serving as the fundamental basis for pathological diagnosis. Inspired from this, we posit the SI strategy, which enables the neural network to effectively learn and capture this essential pathological information. To validate the effectiveness of our strategy, verify our hypothesis of instance, and assess how the SI strategy enhances model performance, we conducted experiments from three perspectives: (1) Model performance comparison; (2) Instance-masked experiment; (3) Exploration of model and SI strategy.

Model performance comparison

For pathological diagnosis, samples are often collected from different hospitals and institutions and may use various analysis scales based on disease condition. To verify our hypothesis about instance in pathological images and evaluate the performance of our model comprehensively, we selected 11 datasets from different institutions at different scales as the experimental samples (HER2 in extended data Fig. 1). Specifically, the 11 datasets include 2 cell level datasets, WBC and SIPaKMeD, 2 cell cluster level datasets, ROSE and Colonoscopy (Mid), 4 local tissue level datasets, pRCC, Warwick-QU (Warwick), MARS, and SEED-Gastric Cancer (GS), and 3 tissue level datasets, CAM16, PAIP2019 (PAIP) and HER2 rumor ROIs (HER2), and the dataset details are presented in the Method section.

Fig. 1 Attention map of SILM on shuffled images.

Meanwhile, to fully demonstrate the effectiveness of the SI strategy, we compared SILM (RS) (based on random shuffle strategy) and SILM (IS) (a variant of our model with in-place shuffle strategy and adaptive training schedular) with other models under the same experimental conditions, including: 2 state-of-the-art (SOTA) models with patch-based learning basic ViT and Swin Transformer (Swin-T); 7 widely applied convolutional neural networks (CNNs) including VGG-16, VGG-19, ResNet50, Inception-V3, Xception, MobileNetV3 and EfficientNet-B3; 3 recent hybrid methods Conformer, CrossFormer and ResNet-ViT. By fully utilizing the instance pyramid to extract the essential information of pathological images, SI strategy could also be regarded as a data augmentation tool that especially suitable for pathological field. Therefore, comparison between SI strategy and 6 SOTA mixing-based data augmentation methods was further conducted.

To visually represent our findings, we plotted the model benchmark scores on each dataset in a grouped box plot (Fig. 2). Note that SILMs (RS and IS) not only achieve the highest average value on 10 datasets, but also exhibit the most centralized data distribution (minimum box range and data variance). These results, based on a diverse set of multiple datasets, provide compelling evidence that observed performance improvements derive from significant methodological advancements.

Fig. 2 Grouped box plot of all model classification performance on 10 datasets.

We contribute this promotion to the model better understanding of pathological image and extraction of effective characteristics. To verify the conjecture, we utilized gradient-weighted class activation mapping (Grad-CAM) to visualize the attention maps generated by 6 different models, including 2 traditional CNNs, 2 Transformer models, and our proposed SILMs. The pyramid-group datasets, which contains 4 levels of pathological images, was employed for comparison purposes. For each dataset, we fed two images from different categories into each model, and the attention map was obtained from the activation of the last layer. As shown in Fig. 3, the original images are displayed in the first column, and the attention maps of the seven models are arranged behind the source image. The last column presents these eight images with masks labeled by an experienced pathologist. Masked images represent the ground truth, indicating areas that deserve attention from pathologists during diagnosis. Compared to other methods, SILMs achieve the most accurate attention regions in the majority of cases, displaying high consistency with the ground truth. Additionally, SILMs can effectively locate key areas that provide image classification information, such as cells, cell clusters, or tissue structures, even with the presence of background differences and image interference. Grad-CAM also reveals that SILMs can accurately mark some separated parts, such as lumens with distinct forms in pRCC samples or cancerization at different locations in CAM16 (Sample2). This finding demonstrates the model remarkable precision and global attention capability in capturing all critical features, even among distant or morphologically different instances. Information in key areas or discrete edges all contribute to the model diagnostic decision-making.

Fig. 3 Attention map analysis of SILMs against SOTA classification models on pyramid-group datasets.

With label information of instance regions, SILM (RS) can better locate these areas and delineate the most accurate boundaries. Equally important prior information in these instance regions guides the model to show similar attention to these regions in CAM. In Comparison, SILM (IS) only uses original samples without mask label and learns dynamically during training. The accuracy of the model attention area may be slightly affected, but as a result, the model can autonomously locate these core instance regions and assign different levels of attention. SI strategy can therefore be extended to various pathological data for broader application space. Note that both SILMs utilize an approach similar to that of pathologists, which may explain their outstanding results. Combined with accurate classification performance, the method presents solid interpretability and significant improvement in instance identification and relationship modeling.

During training, SI strategy was designed to introduce more instance relationships in shuffled images to the model. In turn, how SILMs handle and understand the obtained images is crucial, and worth exploration. Here we display the shuffled image, Grad-CAM, and ground truth in two sections based on different SI strategies (Fig.4). It can be observed that while independent information is preserved in a single patch, relationships between instances in real pathological images may be disrupted, and some new relationships may be introduced. To minimize classification loss, models are compelled to identify the class associated with each instance. Grad-CAM clearly demonstrates that, from the two middle columns in each section, SILMs can distinguish separate patches containing instance information and shows minimal redundant attention spent on the background, even at the boundary of two patches. This indicates that the model concentrates on the key information in each patch, instead of the forms of patch joints. Notably, some patches in the shuffled image may only preserve background regions from the original image during the cropping process. In these cases, most commonly observed from WBC or ROSE samples, the models pay little attention to these regions. Fig. 4 confirms that the model accurately learns the features of instances belonging to distinct categories and the differences between them. Accurate recognition of instances in patches lays the foundation for the model to better comprehend the information in pathology images and make precise classifications. This demonstrates the robustness and effectiveness of the SILMs in handling pathological images, as well as its capability to focus on the relevant features in each patch without being misled by interference or patch boundaries. The model performance in these challenging scenarios highlights its potential for broader applications in pathological image analysis.

Fig. 4 Attention map of SILM on shuffled images.

Instance-masked experiment

The instance is considered to be the most critical information in pathological images and decisive basis for diagnosis by pathologists. The design of model and SI strategy stem from the reflection of this phenomenon and utilization of instance pyramid features. Here we performed some special treatment on pathological images to artificially hide these instance regions by filling with mean values of surrounding pixels. The close connection between instance and pathological image is deeply explored through model performance testing. Fig. 5 shows the representation of SILMs on instance-masked pathological images from pyramid-group datasets. The most intuitive impact of the model attention regions shifting is presented by Grad-CAM in Fig. 5a, while the precise model performance degradation is quantitatively recorded in Fig. 5b. Notably, distinct datasets exhibit some differentiated and interesting situations, primarily due to their special discrepancy in instances.

Leukocyte size and shape provide information, but the intracellular structure is vital for WBC category determination. Masking instances can confuse the model, leading to a classification performance drop to 20–30%, which is close to a random classifier. Similar to WBC, instance-masked regions in ROSE resemble background covering, causing misclassification. The CAMs prove it clearly that the model shows similiar attention on masked regions as the normal background. Since the pure background images are also classified as negative on binary classification, model performance of class1 (negative) was almost unaffected, while a large number of errors occurred at class2 (positive) (Fig. 5b). Circumstance in pRCC differs from ROSE and WBC. In the local-tissue level pathological images, cell cluster plays the role of the instance and is obscured by mask, leaving the structure and the shape of tissue retained. For samples of class1 (RS) and class2 (IS), SILM shows attention to regions both inside and outside of the instance, making it possible to distinguish the instance-masked images from other characteristics. Attention regions in two CAMs of original and processed images exhibit high consistency. However, the instance-masked images can confuse the model when the instance plays an irreplaceable role in discriminating them. Totally opposite attention maps can be observed from the other samples, accounting for nearly half of the performance degradation. For CAM16, it consists of images of the lymph node metastasis of breast cancer. As the cancerous area is encapsulated by normal cellular tissue, we have only covered the metastatic area, leaving the rest regions preserved. Thus, the performance of the model on unprocessed negative samples was not affected. However, masking the instance for positive samples can create a special pattern substituting the original feature. The difference between the filled pattern and the surrounding cell background may result in the SILMs maintaining a high attention to masked regions in the CAM results (RS on class2). This can lead the model to treat modified regions as anomalies and make the same judgment as before, accounting for the majority of positive samples in the CAM16 dataset.

Few errors also occurred when SILM turn to focus on other normal cellular tissues instead of instance regions owing to image modification like class2 (IS), explaining for the little drop on model performance.

Above results on 4 datasets reveal high uniformity between attention maps and classification results, serving as the foundation for our analysis of the model understanding of instance. Through these experiments, we have thoroughly investigated the instances in pathological images at various scales and determined the vital role of instance in these samples. This finding reinforces the importance of instance as a crucial piece of information and a fundamental basis for pathology diagnosis, along with the previous experiment (Fig. 4). It enables us to better devise the model and design the SI strategy from a more essential apprehend of pathological images.

Recent Results on Investigating Instance Relationships with the Shuffle Instance Strategy

Results

Written by Bahjat