AI Rapid Prototyping: Build an instance segmentation neural net for breast cancer cells in one day

Alberto Rizzoli
V7 Labs
Published in
6 min readFeb 5, 2020
Here’s a glimpse of the end result

Start from 95 unlabelled images of breast duct cells dividing, and end up with an AI that can identify, count, and track them in 1 day of work or less using V7 Darwin’s Auto-Annotate functionality.

An un-labeled image of breast duct cells growing in a culture medium

Basics: What is Instance Segmentation

  • When an AI applies a tag to an image, you call that classification
  • When it places a box around an object in an image, that’s detection
  • When it paints an entity with pixels, such as the floor or a face, you call that segmentation, or semantic segmentation.
  • When it finds individual objects, and paints in their pixels, you call that instance segmentation.

To illustrate this point, I will be using cats as a medium:

Much like cats, cells tend to cluster with one another. Also like cats, they are amorphous and easily stretch between other cells. An instance segmentation approach grants us the combined advantages of the previous two on the list.

If we were to use Semantic Segmentation, we would know the % of the screen covered in “cell pixels” but not how many individual cells there are.

The Problem: Identifying and Counting Breast Duct Cells

Given a cell line of breast duct cells, can we tell at what rate they are duplicating?

Cell lines similar to the one we will be looking at are often used in breast cancer research. This line was isolated in 1990 from a 36 year old woman with extensive fibrocystic disease. They are benign epithelial cells, and will duplicate when placed in a culture medium.

Sample a Reasonably Uniform Dataset

A neural network will mostly learn what it can interpolate in the data you feed it, plus a bit of luck. This luck, or out of distribution (OOD) performance, is mostly proportional to the size of the dataset, and its entropy or variability.

If you are creating a proof-of-concept AI, don’t be a hero. Choose something uniform to train and test on rather than a subset of a highly varied dataset. We will use a long recording of breast duct cells duplicating where the shape and number of cells varies widely, but camera angles, microscope parameters, perturbation, and other variables are kept constant.

We are picking 1.2% of our data to train on. That’s 65 images taken from the first few seconds of footage, another 30 as a validation set, and the remaining 12,000 images to test on.

Frame 60, an early stage frame
Frame 360 — this is about where we will stop sampling training images.
Frame 8200 — this is the mess we hope to count in real time.

Annotating cells

Is really dull I’m afraid.

V7 developed a system called Auto-Annotate available within Darwin. A deep learning model sits ready for use in the back-end of your labeling interface, and reacts to your annotations in real-time by completing them after minimal input. In this case, the human detects cells, and the AI segments cells, thereby letting the human know that it has figured out the object of interest.

Roughly define a region and Auto-Annotate’s AI will figure out what you are trying to segment. It is object agnostic, so it will work on a breast duct cells as well as a boat or cat.
You would achieve the same results on cows, for example.

Annotating the image below took 9 minutes using Darwin. It would have taken over 1 hour otherwise.

How long will it take to obtain 65 training images?

9*65 / 60 = 9.75 hours

These are very dense images. It takes almost 10 hours of annotation to obtain the 65 images we are aiming for. With a manual approach, it would have taken 1.5 weeks of full-time work for this 1-day proof-of-concept. Hint: this is among the reasons AI in medical and life science research hasn’t taken off quite yet.

To get this work done, I’m using V7 Darwin’s “Request Annotators” feature that allows me to hire professional annotators from trusted partners to complete this task.

I’ve picked 2 workers to complete the dataset in the next few hours, now all I have to do is review their work from time to time:

6 hours later

Of the 304 images we had available, we were able to have 93 of them labelled. This leaves us 28 images to validate on, whilst still using 65 for training.

Training a Network

V7 Darwin comes with an AutoML engine made to fine-tune Auto-Annotate to new datasets.

In this particular case, we will be using it to train an instance segmentation model rather than an auto-labelling AI. This is a functionality is something we are testing with a small number of customers, as most Darwin users are AI companies with their own architectures, training pipeline, and infrastructure setup.

We’re picking an “Evaluation” model size, which is limited to under 1000 images.

Click Train…

Nothing much to do here but waiting.

6 hours later

The Results are in!

We can now run the rest of the 12000 frames of the recording through the model to see its output.

Whilst going through mitosis, the cells look like round glass beads, then duplicate into two amorphous blobs again.

In hindsight, it would have been better to label cells undergoing mitosis in a different color/class, differentiating them from those that are just swimming along.

Below is the full video of the recording, alongside a cell count averaged every interval of a few seconds.

The cell count is a bit noisy (hey, this was done in one day), with an initial phase of no-growth, where recently duplicated cells are matched by cells exiting the scope field. Later we see linear growth, followed by a slow plateau as the visible field is saturated with cells:

YouTube’s video encoding struggling to keep up with noise

The number of detected cells can be erratic given the small amount of training data. We noticed a few cases of one instance suddenly into three. However, one of these cases may have not been an error:

This single cell seems to split into three.

This isn’t very likely, but we may be looking at a rare case of Tripolar Mitosis, which can occur in cancerous cells.

We have opened the dataset used for this quick prototype, which you can access here: https://darwin.v7labs.com/v7-partner-demos/breast-duct-cell-analysis.

This is part of a series of short research experiments on computer vision AI through the use of tools developed at V7.

--

--