Exploring Tissue Lesion Classification
The microscope has been a tool for the pathologist’s eye to view tissues at a cellular level. With the maturing of the fields of deep learning and computer vision, pathologists now have a tool to supplement their skill set and assist them in image classification. A common task for a pathologist is the examination of tissue from a biopsy at a microscopic level to diagnose cancers. Inspecting tissue slides is painstaking, precise work that requires years of specialized training to master. Computer vision is an obvious tool that can facilitate more rapid and accurate diagnoses. The utility of computer vision within the field of pathology, specifically, has been further established by the advancement of digital pathology. By delegating specific tasks to computer vision software, the pathologist is able to focus their attention on higher level processes including synthesizing information for clinical interpretation and decision making purposes.
This challenge focuses on epithelial lesions of the uterine cervix, and featured a unique collection of thousands of expert-labeled WSIs collected from medical centers across France. The lesions in slides like these are most often benign (
class 0), but some others have low malignant potential (
class 1) or high malignant potential (
class 2), and others may already be invasive cancers (
to develop a bespoke image classifier on microscope slides of uterine cervical tissue biopsies collected from the French population. The SFP and France’s Health Data Hub provided a 928 GB training set of labeled whole slide images with each slide classified by anatomical pathologists according to lesion severity:
0Normal or subnormal
1Low grade squamous intraepithelial lesion
2High grade squamous intraepithelial lesion
3Invasive squamous carcinoma
Whole slide images (WSIs: a WSI is a digital representation of a microscope slide at high levels of magnification) were provided in a variety of formats, though we were interested in the pyramidal TIF format. Pyramidal TIFs are a multi-resolution format with each resolution stored as a separate layer in the TIF file. These images are compressed slightly to make them less hardware intensive, but retain a level of detail adequate for pathologists to use for diagnoses.
Annotation data was also provided for the training images. Here, the pathologists had labeled 300x300 micron regions as lesioned or normal tissue. The annotated regions did not necessarily encompass all lesioned/normal tissue on the slide, however, and lesions could extend beyond the boundaries of the annotated regions.
Ultimately, our goal was to classify at the slide level, not the annotation level. For example, an image labeled as class 3 could have annotations corresponding to all of classes 0, 1, 2, and 3 but at least one region would contain a class 3 lesion.
Whole Slide Images in this dataset can be massive in size at the highest resolution (100,000 x 100,000 pixels, or higher). Much of the slides also contain significant amounts of tissue-less background, various artifacts, and blurred regions.
In order to analyse the WSI, we divide the image into many 300x300 micron tissue candidates. To do this, we used a custom tissue-mask filter, which extracts non-overlapping regions of the WSI. We run this filter at a low resolution, which allows for extremely fast tissue segmentation, but trades some accuracy for this speed.
Once we have a set of tissue candidates, we predict a label for each one, and compile them into a set of candidates for the WSI. The problem is now an instance of Multiple Instance Learning, where we need to predict a single label from a bag of labeled candidates. Due to time constraints, we settled on a very simple prediction method of using a quantile threshold. To do this, we build up a distribution of the candidate labels, and choose the value that occurs at the 70th percentile. This value worked well in practice, despite there being more rigorous methods.
For our candidate classification model, we used a standard ResNet styled CNN architecture. The model was trained on the 300x300 micron images labeled by the pathologists.
For training our CNN, we used a single desktop Titan RTX, a Ryzen 2700x, and 64GB of RAM.
Performance was evaluated according to a metric devised by a panel of pathologists. Each classification was scored as 1 minus the error and the total score was the average across all predictions.
Due to the nature of the problem, it is critical to get the prediction correct; however, not all misclassifications are equivalent. It is much worse to classify a slide as normal when there is an invasive carcinoma present has potential for a much greater downside than the opposite scenario.
Within a single week of work, we were able to put together a system which achieved a score of 0.8933. This represents an average error of ~0.1, which corresponds to an average prediction that is of within 1 label from the truth.
Thank you to DrivenData, the French Pathology Society, and the Health Data Hub for organizing this competition. A special thank you to all of the individual pathologists who labeled and annotated the data. The creation of datasets like the one used here are a valuable resource that captures pathologist’s collective expertise and allows for progress to be made in medical computer vision.