How Lunit SCOPE finds and categorizes cells in histology images
Lunit’s AI-based products are developed to tackle cancer in a two-pronged fashion: (a) accurate and early diagnosis, and (b) cancer treatment recommendations via histology-based prognosis. Lunit’s SCOPE line of products such as SCOPE IO and SCOPE PD-L1 are in the latter group, and this is what we work on in the Oncology AI Research department (that I am a part of).
This blog post is a quick overview on how our AI models look for (localize) and categorize (classify) cells in histology images. These micro-scale predictions accumulate to form an understanding at the macro- or slide-level, typically through image-based biomarkers such as tumor-infiltrating lymphocytes density (TIL density) or tumor proportion score (TPS). These slide-level quantities directly inform the decision on whether to administer specific cancer therapies (e.g. PD-1/PD-L1 inhibitors for immunotherapy, HER2 grading for targeted therapy) and thus it is of paramount importance to localize and categorize cells accurately.
Histology images are large. Very large.
Let’s take a step back and ask: what are “histology images”? In the context of Lunit SCOPE, histology images are stained images of human tissue, acquired through a biopsy procedure. The steps between biopsy and digitized images (a.k.a. whole-slide images or WSI) are many, as illustrated in the figure below.
As you can imagine, such tissues can be several centimeters or millimeters large. Since a tumor cell is 10 to 20 microns large, if you had a tissue sample with a size of 2 centimeters by 2 centimeters (or 4 centimeters squared), it could contain a million cells or more. If you were to take 5 seconds to categorize one cell, it would take you 58 days without taking a toilet break or going to bed. Also, it is rather implausible that even a well-trained human professional could categorize all of the one million cells consistently, when asked to do so on different days (“intra-rater variability”). Even less likely, is that different individuals would agree on their categorization of each and every one of those million cells (“inter-rater variability”).
An AI-based tissue analyzer promises to be consistent and fast, while providing a comprehensive analysis of the provided image, but how do we build such a system, in particular to localize and categorize cells?
Zooming in
To begin, one must zoom in. One must zoom in very far. Let’s take as an analogy, a typical view on Google Maps. Looking at cells in histology images, is similar to starting from a zoomed-out country view on Google Maps. Take Singapore, as an example in the figure below. Scrolling in, one starts seeing the structure of neighborhoods, and further on, streets and buildings. When zoomed in to the full extent, it is possible to distinguish between the cars and trucks parked at Marina Bay, and label them by color.
Now, let’s look at a whole-slide image. Even when looking at a WSI of modest size, such as the one below, we find that very different structures are revealed at different zoom levels. After some effort, we can reach the lowest levels, where we find cells of all kinds of shapes and sizes, some benign, others malignant, all forming the full tissue (as acquired by biopsy).
In pixels, WSIs can often be as large as 80,000 x 60,000 or 4.8 gigapixels. This vast amount of image data cannot be handled easily, so we take small crops from the full image when forming our annotated data for the training of cell detection models.
Annotating cell patches
When selecting the regions that we should annotate, one must take special care in deciding what to annotate and how large the region should be. Our Data-centric AI Research team (which is hiring Research Scientists) often focuses on the what side using techniques from active learning. As for how large, we have determined over many years of experience that a region of approximately 0.04 millimeter squared yields an appropriate number of cells for expert annotators (board-certified pathologists) to click on.
Though our network of pathologists can produce high quality annotations, some images can result in diverging annotations that require an additional control step. We thus apply Quality Control (QC) via in-house pathologists who understand exactly what we need for our task, and to which level of quality or detail.
Lastly, to ensure that our models can generalize well and avoid learning biases from our evaluation datasets directly, we carefully manage our whole-slide-image, patient, and specimen data such that we never train on patients and specimens that are included in our final validation datasets (often called “test set” in the machine learning world).
Simple is best — a segmentation approach to cell detection
Once a sufficient dataset of annotated image patches have been collected, we must train a model which can localize and classify cells.
Over time, we have found that well-tuned simple architectures are most reliable. Thus, we adopt a standard fully convolutional neural network architecture for the majority of our products’ cell models (e.g. Deeplab v3+ with ResNet backbone). Variations of this model are highly effective and have been used in several publications so far.
When evaluating several candidate models, we carefully perform sub-group analysis to understand if we are performing well in challenging edge-case scenarios. At a basic level, we divide our data by scanner type, cancer type, or other methods of categorization, then select models that perform as well as possible on all sub-groups.
After selecting a final model, we perform inference with a sliding window method, such that we can detect all of the cells present in a given gigapixel WSI.
Broadening the field-of-view
Such a sliding window approach can be effective, and can produce visually impressive results (check out the Lunit SCOPE demo here).
However, as shown in the figure above, our models can be somewhat uncertain when provided with an equally limited field-of-view as during training time. Our simple solution is to run inference at a larger field-of-view (which also means a larger input image size to the model), such that the fully convolutional model has a chance to understand its surrounding context better. This has proved to be sufficient in reducing the visual artifacts as shown above.
Closing
In this post, we explored histology images, and the cells that lie within. We discussed how we typically annotate parts of the full image, such that we can localize and classify individual cells.
Though simple, this is not the end of our story. Our recent work named OCELOT (blog post, project page), tackles the cell detection problem from a different angle, explicitly trying to incorporate larger “context” such that the cell model makes fewer mistakes (due to low field-of-view). Our MICCAI 2023 challenge opens this problem setting up to the Computational Pathology community, such that we can tackle this problem together, and work towards conquering cancer for humanity.
Our work on large-scale self-supervised pre-training for pathology shows promise in improving AI models in diverse tasks, and internally we have seen gains in the generalization capabilities of our cell models, in particular for cases with low amounts of annotated data (sometimes allowing us to reduce our annotation cost by half).
Going further, we can detect cells with more specialized or modern neural network architectures, play with other methods of incorporating context, and consider multi-task learning approaches to learn from heterogeneous annotation formats. However, these directions and ideas are still being developed and will likely be shared in another blog post in the distant future. Perhaps, you, the reader, may be the one writing our next post 😉
Should you wish to join us in our mission to better understand cells in histology images, do check out our Careers page and/or hit me up on LinkedIn. Let’s talk!
Acknowledgements
Many thanks to Sérgio Pereira, Biagio Brattoli, rjw0205, Mohammad Mostafavi, and Donggeun Yoo for helping me improve this post.