How did we use computer vision to help medical experts diagnose Follicular Lymphoma?
We have built a deep learning classifier that learns to differentiate Follicular Lymphoma from its non-malignant twin Follicular Hyperplasia using only whole-slide images of lymph nodes extracts. The model has shown high predictability (around 93% of accuracy). Image processing was revealed to be the most important step in building the model. In fact, normalising the staining of the images brings robustness and considerably increases the performances. We have also provided a simple way to interpret the predictions of the model by highlighting parts of the images flagged as most suspicious in predicting the class Follicular Lymphoma (using heatmaps). The end goal is to give experts enough tangible information to understand and interact with the model’s output.
This project is part of Artefact’s contribution in Tech for Good. The project has been conducted in collaboration with Institut Carnot CALYM, a consortium dedicated to partnership research on lymphoma, and Microsoft.
In autumn 2019, the Institut Carnot CALYM launched a structuring programme aimed at setting up a roadmap to optimise the valorisation and exploitation of data from the clinical, translational and preclinical research conducted by the members of the consortium for more than 20 years. This project, proposed by Pr Camille Laurent (LYSA, IUCT, CHU Toulouse, France) and Pr Christiane Copie (LYSARC, Pierre-Bénite, France), both members of Institut Carnot CALYM, is part of this structuring programme.
The primary objective of this research project is to develop a deep-learning algorithm to assist pathologists in diagnosing Follicular Lymphoma. A secondary objective is to identify informative criteria that could help medical experts understanding the morphological differences between Follicular Lymphoma and Follicular Hyperplasia which will be referred below as FL and FH.
What is Follicular Lymphoma? What are the challenges in its diagnosis?
FL is a subtype of Lymphoma, the most frequent blood cancer in the world. There are more than 80 types of Lymphoma and this diversity makes its diagnosis difficult, even for experts. Moreover, FL is very similar to FH which is not cancerous, adding challenges to its diagnosis.
In this article, we will describe our approach in building a classifier for FL and FH using only labelled whole-slide images.
Whole slide images are high resolution digital files of scanned microscope slides. In our case they contain extract of lymph nodes.
How could deep learning help in its detection?
Using whole-slide images of FL and FH, we trained a binary classifier through a patch-based approach. Our model architecture is a simple Resnet-18 trained on a few epochs (~10).
After predicting the class of an observation with the classifier, we extract the last activation layer to build a heatmap on top of the input image to highlight parts that have prompted the model in defining a given class.
Why did we use a patch-based classification?
Patch-based classification is a classification technique where the class of a given observation is built based on the aggregation of the predictions of its components (patches).
In our case it is used because the images are way too large to be used directly on the model.
In fact, whole-slide images are very large (~10⁵ pixel square). Their size makes training a deep learning model almost impossible with common tools. To solve this issue, we divided them into patches of the same size following two important criteria:
- the patches must be big enough so that the follicles remain visible in them
- the patches should be small enough so that training a model can be done in a reasonable amount of time
In patch-based classification, the model output can be interpreted as that of a classical classification except that the first layer of computation is at the whole-slide level.
For example, when predicting the class of a slide of FL, a score of 98% would mean that 98 % of the patches it is composed of have been predicted to be FL.
At the dataset level, this slide will be predicted with a score of 0.98 for the FL class.
PS: We made the hypothesis of dividing the images into patches based on medical experts’ conclusions stating that in a whole-slide of FL, the follicles are expected to be present everywhere.
Our training set is composed of 58k randomly selected patches (1024 pixel square) of FL and FH extracted from a set of 30 whole-slide images in each of the 2 classes.
20% of the patches was sampled for validating the model performance at training time.
Our testing set is composed of 15 whole-slide images, each divided into patches. This reference set has been used to compare the results of different training approaches that we will precise below.
The global pipeline is described below:
Before training the deep learning classifier: Image preparation and processing
After training : Inference and interpretation
In the sections below, we will give the details about these different steps of the pipeline.
Data preparation and processing
1 — Tiling
As stated earlier, whole-slide images are very large and cannot directly be fed to a classification model unless you are using a super galactic hardware.
We used the library openslide to read the slides and its deepzoom support to divide the images into relatively small tiles of size 1024 pixel square. After breaking them into tiles we ran them into a basic cleaner that dropped all tiles that were not at the center of the tissue (borders, holes etc).
2 — Stain normalisation
The second step of our data processing, which is also the most important step, is the stain color normalisation. Staining is the process of highlighting important features on slides and enhancing the contrast between them. The staining system used is the common H&E (Hematoxylin and Eosin). However, since the images are coming from many different laboratories, we have observed variations in the colouring of the slides. They mainly come from differences in the dying process from one laboratory to another. These differences can affect the model’s performance a lot.
We used classical techniques to normalise the coloration of the dataset before training the model.
We picked the Reinhard technique to see the impact on the model.
Training a Resnet-18 classifier
After processing the whole-slide images, the training went smoothly (dropout, weight decay, etc..). Nothing fancy except from adding mixup in the data augmentation. We used a Resnet18 trained from scratch since pre-trained models were not significantly improving our results. We also preferred the Resnet-18 since the Resnet-34 and Resnet-56 were not improving our performances.
After ~10 epochs, our model was ready for testing.
We used the very practical Fastai library to build our models with few efforts.
The results of 3 experimentation are worth being mentioned:
- A simple resnet-18 as baseline
- A resnet-18 + stain normalization on the dataset
- A resnet-18 + stain normalization on the dataset + mixup as data augmentation
The results on the test set for these 3 experimentations are shown below:
Stain normalisation is by far the most important step in our modelling approach. We were experiencing generalisation problems (red line) but it definitely help in solving the issue. Adding mixup and a 2-step tiling makes it even better.
MixUp is a data augmentation technique which consists of creating new observations by linearly interpolating many samples.
Interpreting the results of a computer vision classifier
In order to easily communicate the results to medical experts, we provided images with heatmaps to highlight where the model’s focus was when predicting a given label. We did that by extracting the last activation layer of the convolutional network and by linearly extrapolating it on the image we were predicting onto.
Interpreting the model’s output with heatmaps has been very useful in adjusting the modelling approach as it gives experts ways to analyse what the model is actually doing. Through our exchanges with experts, we (data scientists) were able to adjust how we to handle better the dataset and make the model more robust (i.e able to adapt to different types of inputs). And also to make sure it serves its purpose. It was in fact how we realised the need to normalise the staining of the images.
Conclusion and Key learnings
The goal of this study was to explore the process of creating a good deep learning base classifier for differentiating Follicular Lymphoma and Follicular Hyperplasia. Our keys learnings are listed below:
- The high importance of color normalisation when training a model with this type of dataset
- Usage of advanced data augmentation technique such as mixup can help increase performances
- The tight collaboration with medical-expert to challenge models at each iteration
This pilot study has shown interesting results and could help pathologists when building an AI-assistant to diagnose Follicular Lymphoma specially in area without haematologist. An abstract was submitted for the 25th Congress of the European Hematology Association held in Frankfurt, Germany in June 2020.
ARTIFICIAL INTELLIGENCE AGAINST LYMPHOMA : A NEW DEEP LEARNING BASED ANATOMOPATHOLOGY ASSISTANT TO DISTINGUISH FOLLICULAR LYMPHOMA FROM FOLLICULAR HYPERPLASIA
Yague Thiam 1, Nadine Vailhen 2, Arnaud Abreu 3, Charlotte Syrykh 3, François-Xavier Frenois 3, Romain Ricci 4, Bruno Tesson 5, Delphine Sondaz 6, Emmanuel Gomez 6, Aude De Grivel 1, Christiane Copie 7, Camille Laurent* 8
1/ Artefact, Paris, 2/ Direction des Opérations Histopathologiques, LYSARC, PIERRE-BÉNITE Cedex, 3/ Département de Pathologie, IUCT, CHU Toulouse, Toulouse, 4/ Plateforme Imagerie, LYSARC, 5/ Département de Bioinformatique, 6/ Département R&D, LYSARC — Carnot CALYM, 7/ Département de Pathologie, LYSARC, PIERRE-BÉNITE Cedex, 8/ Département de Pathologie, LYSA, IUCT, CHU Toulouse, Toulouse, France
About Institut Carnot CALYM:
With about 250 publications ranking A and 100 patents in its portfolio, the Institut Carnot CALYM is the only consortium worldwide in lymphoma to combine fundamental, translational, preclinical and clinical research teams and an operational clinical research organization, in order to accelerate innovation and its transfer in clinics through public/private partnerships, for the benefit of patients.
Lymphoma is currently the 1 st blood cancer and 6th global most common cancer worldwide and remains a major public health issue. In front of this situation, the 20 CALYM entities, including the cooperative group, LYSA (Lymphoma Study Association), its clinical research operation structure, LYSARC (Lymphoma Academic Research Organisation) and 18 public research laboratories, offer a unique R&D approach, from basic research to the evolution of standards of care. The consortium proposes to its partners collaborative projects by linking clinical research and clinical development.
CALYM is labelled a “Carnot Institute” since 2011. Created in 2006, the Carnot label is attributed to the public/parapublic research institutes that have committed to promoting innovation with socio-economic actors, mainly companies (from SMEs to large groups). The label is awarded by the Ministry for Higher Education, Research and Innovation, upon proposal by the French National Research Agency.
CALYM is ISO 9001:2015 certified for the management and monitoring of its partnership research activities and for the coordination of the activities related to its CeVi lymphoma viable cell collection.
CALYM is a member of the FINDMED “Health-medicines” Carnot sector.
To know more:
Le réseau des Carnot | La recherche pour l'innovation des entreprises
Membre de l' institut Carnot 3BCAR, Toulouse White Biotechnology organise TWB Start-up Day (4e édition). TWB® START-UP…
Follow us on Twitter : @Reseau_Carnot
Follow us on LinkedIn : Le réseau des Carnot