GSoC’22 @ TensorFlow: Segmentation of Prostate Gland Tissue for Cancer Detection

Mayuresh Agashe
8 min readJul 5, 2022

--

This blog post presents technical insights into the approach used in image segmentation of ‘prostate biopsy slides’ for cancer detection. This project is a part of my journey of Google Summer of Code 2022 with TensorFlow.

Google Summer of Code 2022 @ TensorFlow

About the Project

With more than 1 million new diagnoses reported every year, prostate cancer (PCa) is the second most common cancer among males worldwide that results in more than 350,000 deaths annually. The key to decreasing mortality is developing more precise diagnostics.

Keeping the above quote in mind, this project attempts to develop a sophisticated diagnosing technique that will highlight the potential cancerous regions in a given ‘Prostate biopsy slide’, thereby aiding the doctors in faster and more accurate diagnosis.

The source code for the project is given herewith.

Understanding the Structure of the Data

The dataset used for this project is hosted on Kaggle(size: 412 GB🤯)and contains three major components. Link to the dataset.

  1. Biopsy Slides in .tiff format.
  2. Mask encodings of biopsy slides in .tiff format.
  3. A CSV file that provides information about each biopsy slide.

Brief about tiffs provided in this dataset

Each of the tiff slides/images provided by the dataset authors has 3 levels of resolution. Level 0 corresponds to the level of highest resolution typically between 5k-40k pixels in both x and y! These levels are related to each other by downsampling factors 1, 4, and 16 (more on downsampling factors later). The dimensions of each level differ based on the dimensions of the original image.

The provided mask encodings come from two different data centers viz.: ‘Radboud’, and ‘Karolinska’. Each center has its own encoding pattern. For ‘Radboud’ center: {0: background, 1: stroma, 2: benign epithelium, 3: Gleason 3, 4: Gleason 4, 5: Gleason 5} and for ‘Karolinska’ center : {0: background, 1: benign, 2: cancer}. In a nutshell, encodings greater than or equal to 3 and encoding 2 represent malignant cells for ‘Radboud’ and ‘Karolinska’ data centers respectively. The label information is stored in the red (R) channel, the other channels are set to zero and can be ignored.

The information about the labels of slides is contained within the provided CSV file. Every slide is assigned a Gleason Score value and its corresponding ISUP Grade. Both of these scales are a way to denote the aggressiveness of the cancer cells present in the tissue. Grade Group 0 represents benign tissue.

Source & Credits: Prostate Cancer Foundation

Note: For simplicity, we will be working with a subset of the original dataset. I have written a script that downloads a subset of the given size from the original dataset in a zip format. The source code for the script can be found here.

Prototype Demo:

Prototype in Action

The Goal

What is Pseudo Segmentation?

Pseudo Segmentation is a process of creation of fake mask maps by using the classification approach on the entire image at the patch level. The entire slide image is broken down into patches of fixed length and these patches are then classified. If found positive, that patch in the original image is then masked, thereby, creating a fake mask map.

Building an Efficient Input Pipeline for Data Ingestion

Okay, at this point we have established that the first step will be to extract the patches from the slides and then just apply a binary classifier to those patches. Seems quite straightforward, so what’s the issue?

Let’s try to proceed with the steps that we know and let’s visualize the results.

Original Slide & Extracted Patches

As you can see, 80% of the patches are just blank slides and will introduce noise, if included in the training set. The workaround here is quite interesting. As far as the training set is concerned, mask encodings have been provided to us in the dataset along with the information about the encodings in the form of key-value pairs. So, to filter out these blank patches, we will drop all the patches which will be having only 0 as the pixel value(0 represents ‘background’ for both data centers). Awesome! Let’s apply this condition and visualize the results.

Remaining patches after dropping white-background patches

Looking at the results, the said approach successfully filtered out the patches, not bearing the tissue. However, there are still a few patches where the tissue is present in a negligible amount. To mitigate this issue, a parameter %tissue has been exposed to the users with a default value of 35%. Therefore, all the patches bearing less than 35% tissue region will be dropped. Let’s visualize the results.

Final Regions of Interest from the Slide

By applying these constraints, regions of interest have been successfully extracted! Cool😌!

Apart from these challenges, considering the space complexity is crucial indeed! As mentioned previously, level 0 images are enormous and cannot fit into the RAM. To extract patches from any level that the user wants, building a dynamic, robust, and fast input pipeline is necessary. Hence we will be using the openslide-python module to read, only the required regions from any level without storing the entire image in the RAM.

To achieve all the tasks discussed, I have built a custom DataGenerator that extracts the patches, validates them, and writes to the disk for future use! The source code for the DataGenerator is provided at the end.

Model Building and Inferencing

With the extracted patches at our disposal, labeling those patches become inevitable. One of the naïve methods could be to look up each slide into the provided CSV file and label all the valid patches of that slide with the corresponding Gleason Score. However, it is very unlikely that the entire tissue will be cancerous, and hence, this method will lead to mislabeling the patches. Evidently, this is the case!

Slides and their corresponding masks

Even though these tissues have a label Gleason Score(4+5)(cancerous)in the CSV file, cancerous cells are localized only in certain regions of the slide(red-colored part). Once again, the mask encodings are here to rescue! To correctly label these patches, their corresponding mask encodings can be referred to. For instance, consider a patch extracted from the coordinates (X, Y) with dimensions (128, 128, 3), to label this patch, we will analyze a region with the coordinates(X, Y) and dimensions (128, 128) from its mask, thereby labeling the patch via locality of reference!

Although this labelling approach is very efficient, there is a slight downside. Imagine a scenario where only certain areas, in a patch under observation are malignant, but, we are labelling the entire patch as malignant which is not true! However, in the medical field, we ought to prioritize recall over precision. With this analogy, over-labeling the areas is better than under-labelling which ensures that none of the malignant parts is being left out. This approach will sure increase the job for doctors but is a safe bet!

With this, we have our dataset ready for training. Vision Transformers(ViTs), SSDMobileNets, and InceptionResNets are safe bets and gave fairly accurate results.

Fake Mask-Maps and Math Involved

With the trained model in hand, it is finally time to create segmentation maps for the slides. Now the task here is to let the user decide on which level they want the inference and reflect that inference onto a segmentation map at level 2.
As established earlier, intermediate levels can be created by downsampling a higher resolution level. The downsampling factors are used to translate results from one level to another. In our case, the downsampling factors corresponding to the levels 0, 1, and 2 are 1.0, 4.0, and 16.0. In layman’s language, one can say that when we move 1 pixel at level 2 it is equivalent to moving 16 pixels and 4 pixels at levels 0 and 1 respectively. To get a rough idea about this concept, let’s visualize the same region of an image at all levels.

A patch observed under different index levels

If you observed closely, the patch size at each level is decreasing by a factor of 4. Say, the level 0 patch was extracted from coordinates (X, Y) with dimensions (2056, 2056), then, at level 1, the same patch can be observed at coordinates (X / 4, Y / 4) with dimensions (514, 514). Similarly, at level 2, this patch will be present at (X / 16, Y / 16) with dimensions (128, 128). All in all, with the help of the downsampling factors, translation between levels is possible. We will be using the same methodology to reflect the inference at any level onto a segmentation map at level 2.

Finally, in the inference method, parameter index level has been exposed to the user which determines from what level the patches will be extracted and classified. After the classification at the said level, the results are mapped to level 2, thereby creating a masked slide at level 2.

Voila 🤩

There we have it! We finally achieved the segmentation!

The features which I am currently working on:

  1. Multithreading to speed up the process of patch extraction and disk write-up.
  2. Developing a robust cloud inference engine for this task!

About me

I am a pre-final year major in B.Tech | Computer Science & Engineering with a specialization in Artificial Intelligence & Machine Learning at Vellore Institute of Technology, Bhopal. I am a certified TensorFlow Developer and a proud open-source contributor.

Feel free to reach out if you end up having doubts about the project.

LinkedIn: https://www.linkedin.com/in/mayureshagashe2105/

GitHub: https://github.com/mayureshagashe2105

Twitter: https://twitter.com/MayureshAgashe_

Thank you!

--

--

Mayuresh Agashe

GSoC’22 @ TensorFlow || Certified TensorFlow Developer