The wonderful tumor microenvironment (TME) — zooming into a pathology image uncovers diverse tissue structures and cells that form them. Lunit SCOPE aims to turn such image data into understandable numbers and statistics, for improved prognosis and subsequent patient survival. (Sample taken from TCGA BRCA)

How Lunit SCOPE finds and categorizes cells in histology images

Seonwook Park
Lunit Team Blog
7 min readJul 14, 2023

--

Lunit’s AI-based products are developed to tackle cancer in a two-pronged fashion: (a) accurate and early diagnosis, and (b) cancer treatment recommendations via histology-based prognosis. Lunit’s SCOPE line of products such as SCOPE IO and SCOPE PD-L1 are in the latter group, and this is what we work on in the Oncology AI Research department (that I am a part of).

This blog post is a quick overview on how our AI models look for (localize) and categorize (classify) cells in histology images. These micro-scale predictions accumulate to form an understanding at the macro- or slide-level, typically through image-based biomarkers such as tumor-infiltrating lymphocytes density (TIL density) or tumor proportion score (TPS). These slide-level quantities directly inform the decision on whether to administer specific cancer therapies (e.g. PD-1/PD-L1 inhibitors for immunotherapy, HER2 grading for targeted therapy) and thus it is of paramount importance to localize and categorize cells accurately.

Histology images are large. Very large.

Let’s take a step back and ask: what are “histology images”? In the context of Lunit SCOPE, histology images are stained images of human tissue, acquired through a biopsy procedure. The steps between biopsy and digitized images (a.k.a. whole-slide images or WSI) are many, as illustrated in the figure below.

Whole-slide images (WSI) are created via a complex process of handling tissues after biopsy, eventually leading to digital scanning.
How histology images are made - histology images are created via a complex process starting from tissue biopsy (often performed during cancer diagnosis), and ending with digitization via a pathology slide scanner. A few images were taken from here.

As you can imagine, such tissues can be several centimeters or millimeters large. Since a tumor cell is 10 to 20 microns large, if you had a tissue sample with a size of 2 centimeters by 2 centimeters (or 4 centimeters squared), it could contain a million cells or more. If you were to take 5 seconds to categorize one cell, it would take you 58 days without taking a toilet break or going to bed. Also, it is rather implausible that even a well-trained human professional could categorize all of the one million cells consistently, when asked to do so on different days (“intra-rater variability”). Even less likely, is that different individuals would agree on their categorization of each and every one of those million cells (“inter-rater variability”).

An AI-based tissue analyzer promises to be consistent and fast, while providing a comprehensive analysis of the provided image, but how do we build such a system, in particular to localize and categorize cells?

Zooming in

To begin, one must zoom in. One must zoom in very far. Let’s take as an analogy, a typical view on Google Maps. Looking at cells in histology images, is similar to starting from a zoomed-out country view on Google Maps. Take Singapore, as an example in the figure below. Scrolling in, one starts seeing the structure of neighborhoods, and further on, streets and buildings. When zoomed in to the full extent, it is possible to distinguish between the cars and trucks parked at Marina Bay, and label them by color.

Zooming in on Google Maps is quite similar to zooming in on a histology image (Imagery from Maxar Technologies and Google)

Now, let’s look at a whole-slide image. Even when looking at a WSI of modest size, such as the one below, we find that very different structures are revealed at different zoom levels. After some effort, we can reach the lowest levels, where we find cells of all kinds of shapes and sizes, some benign, others malignant, all forming the full tissue (as acquired by biopsy).

Zooming in on a whole-slide image is as fun as zooming in on satellite imagery. After chasing down larger structures, one can find the cells that form the tissue, in various shapes and sizes (source: Lunit SCOPE demo)

In pixels, WSIs can often be as large as 80,000 x 60,000 or 4.8 gigapixels. This vast amount of image data cannot be handled easily, so we take small crops from the full image when forming our annotated data for the training of cell detection models.

Annotating cell patches

When selecting the regions that we should annotate, one must take special care in deciding what to annotate and how large the region should be. Our Data-centric AI Research team (which is hiring Research Scientists) often focuses on the what side using techniques from active learning. As for how large, we have determined over many years of experience that a region of approximately 0.04 millimeter squared yields an appropriate number of cells for expert annotators (board-certified pathologists) to click on.

Cell patch annotations — typically, a very small area is annotated with cells, as even such tiny regions can contain hundreds of cells from diverse classes. Depicted in this example from TCGA LUSC, are tumor cells annotated as red dots.

Though our network of pathologists can produce high quality annotations, some images can result in diverging annotations that require an additional control step. We thus apply Quality Control (QC) via in-house pathologists who understand exactly what we need for our task, and to which level of quality or detail.

Lastly, to ensure that our models can generalize well and avoid learning biases from our evaluation datasets directly, we carefully manage our whole-slide-image, patient, and specimen data such that we never train on patients and specimens that are included in our final validation datasets (often called “test set” in the machine learning world).

Simple is best — a segmentation approach to cell detection

Once a sufficient dataset of annotated image patches have been collected, we must train a model which can localize and classify cells.

Over time, we have found that well-tuned simple architectures are most reliable. Thus, we adopt a standard fully convolutional neural network architecture for the majority of our products’ cell models (e.g. Deeplab v3+ with ResNet backbone). Variations of this model are highly effective and have been used in several publications so far.

A typical Lunit SCOPE cell model utilizes a fully convolutional neural network architecture (FCN) to simultaneously localize and classify cells.

When evaluating several candidate models, we carefully perform sub-group analysis to understand if we are performing well in challenging edge-case scenarios. At a basic level, we divide our data by scanner type, cancer type, or other methods of categorization, then select models that perform as well as possible on all sub-groups.

After selecting a final model, we perform inference with a sliding window method, such that we can detect all of the cells present in a given gigapixel WSI.

Broadening the field-of-view

Such a sliding window approach can be effective, and can produce visually impressive results (check out the Lunit SCOPE demo here).

Artifacts created by running inference with a smaller field-of-view (FoV) can disappear simply by running inference with a larger window-size or FoV.

However, as shown in the figure above, our models can be somewhat uncertain when provided with an equally limited field-of-view as during training time. Our simple solution is to run inference at a larger field-of-view (which also means a larger input image size to the model), such that the fully convolutional model has a chance to understand its surrounding context better. This has proved to be sufficient in reducing the visual artifacts as shown above.

Closing

In this post, we explored histology images, and the cells that lie within. We discussed how we typically annotate parts of the full image, such that we can localize and classify individual cells.

Though simple, this is not the end of our story. Our recent work named OCELOT (blog post, project page), tackles the cell detection problem from a different angle, explicitly trying to incorporate larger “context” such that the cell model makes fewer mistakes (due to low field-of-view). Our MICCAI 2023 challenge opens this problem setting up to the Computational Pathology community, such that we can tackle this problem together, and work towards conquering cancer for humanity.

Our work on large-scale self-supervised pre-training for pathology shows promise in improving AI models in diverse tasks, and internally we have seen gains in the generalization capabilities of our cell models, in particular for cases with low amounts of annotated data (sometimes allowing us to reduce our annotation cost by half).

Going further, we can detect cells with more specialized or modern neural network architectures, play with other methods of incorporating context, and consider multi-task learning approaches to learn from heterogeneous annotation formats. However, these directions and ideas are still being developed and will likely be shared in another blog post in the distant future. Perhaps, you, the reader, may be the one writing our next post 😉

Should you wish to join us in our mission to better understand cells in histology images, do check out our Careers page and/or hit me up on LinkedIn. Let’s talk!

Acknowledgements

Many thanks to Sérgio Pereira, Biagio Brattoli, rjw0205, Mohammad Mostafavi, and Donggeun Yoo for helping me improve this post.

--

--

Seonwook Park
Lunit Team Blog

Research Scientist @ Lunit with a PhD from ETH Zurich