Week 3— Histopathologic Cancer Detection

Furkan Kaya
bbm406f19
Published in
2 min readDec 15, 2019

Hello everyone! We will share with you today the third series of our Machine Learning Course Project on Cancer Detection with Histopathological Data. This week, we will provide a detailed description of the data in our data set. So let’s start.

Week 1 — Histopathologic Cancer Detection
Week 2 — Histopathologic Cancer Detection

Lymph node metastasis (colorectal carcinoma). H&E stain.

Details of the Dataset

In the data set that we will use, a positive label means that the image has at least one tumor tissue in the central region of 32px square. This means that the 96px square images outside the 32px square central region do not affect the label. For this reason, it would make sense to crop only to take the central region.

Samples of positive and negative data (96x96 px)

As you can see in the examples above, it is impossible for an untrained person to classify a metastasis as to whether it contains tumor tissue or not, even for pathologists, it is very time consuming and difficult. According to Libre Pathology, lymph node metastases can have these features:

  • Foreign cell population-key feature.
    - Classic location: subcapsular sinuses.
  • +/- Cells with cytologic features of malignancy.
    -
    Nuclear pleomorphism (variation in size, shape, and staining).
    - Nuclear atypia: Nuclear enlargement, irregular nuclear membrane, irregular chromatin pattern, esp. asymmetry, large or irregular nucleolus.
    - Abundant mitotic figures.
  • +/- Cells in architectural arrangements seen in malignancy; highly variable — dependent on tumor type and differentiation.
    - +/- Gland formation.
    - +/- Single cells.
    - +/- Small clusters of cells.

Descriptor for Data

Description of the data

Our project aims to identify metastatic cancer in small image patches taken from larger digital pathology scans. Of course, our algorithm will not be completely successful. It will, however, at least indicate regions containing a high proportion of tumor tissue. This will save time for pathologists.

We’re going to start coding next week. I’m sure the coding part will make you more excited. But knowing our data well and having more information about the domain will make things much easier :)

via GIPHY

References

--

--