COVID-19 Instance segmentation on X-Ray Images Using MASK R-CNN

Vedant Shrivastava
9 min readJul 18, 2020
Prediction of COVID-19 infection with Deep Learning using Chest X-Rays.

Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was first identified in December 2019 in Wuhan, Hubei, China, and has resulted in an ongoing Worldwide pandemic. The first confirmed case has been traced back to 17 November 2019 in Hubei. As of today, 19th July 2020, more than 14.1 million cases have been reported across 188 countries and territories, resulting in more than 597,000 deaths. More than 7.92 million people have recovered.

Introduction:

Deep learning is an artificial intelligence function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Deep learning is a subset of machine learning in artificial intelligence (AI) that has networks capable of learning unsupervised from data that is unstructured or unlabeled. Also known as deep neural learning or deep neural network.

What is R-CNN?

Object detection is the process of finding and classifying objects in an image. One deep learning approach, regions with Convolutional neural networks (R-CNN), combines rectangular region proposals with Convolutional neural network features. R-CNN is a two-stage detection algorithm. The first stage identifies a subset of regions in an image that might contain an object. The second stage classifies the object in each region.

There are two stages of Mask R-CNN.

  1. First, it generates proposals about the regions where there might be an object based on the input image.
  2. Second, it predicts the class of the object, refines the bounding box, and generates a mask in the pixel level of the object based on the first stage proposal. Both stages are connected to the backbone structure.

Applications for R-CNN object detectors include:

  • Autonomous driving
  • Smart surveillance systems
  • Facial recognition

What is Mask R-CNN ?

Illustration of Mask RCNN structure

Mask R-CNN has been the new state of the art in terms of instance segmentation. Mask R-CNN is a deep neural network aimed to solve the instance segmentation problem in machine learning or computer vision. In other words, it can separate different objects in an image or a video. We give an image, it gives us the object bounding boxes, classes, and masks.

What is Supervisely ?

Supervisely is a web platform where we can find everything we need to build Deep Learning solutions within a single environment. This platform covers the entire R&D lifecycle for computer vision. It allows us to interact from image annotation to neural networks training 10x faster.

Benefits of using supervisely:

  1. Organize image annotation/data management/manipulation within a single platform at scale.
  2. Integrate custom NNs or user pre-trained models from Model Zoo, perform / track / reproduce tons of experiments.
  3. Use data science workflows out of the box: upload new data and continuously improve the accuracy of your neural networks.
  4. Combine different neural networks together into a single pipeline with post-processing stages and deploy these pipelines as API.
  5. Utilize NNs to speed up image annotation process: the platform has trainable SmartTool, supports Active Learning and Human in the Loop.

Understanding Image Segmentation:

Image segmentation is the process of partitioning a digital image into multiple segments. The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.

Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, Image Segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

There are 2 types of Image Segmentation- Instance Segmentation and Semantic Segmentation.

Difference between Instance Segmentation and Semantic Segmentation

Understanding Ground Glass Opacity in X-Rays:

The COVID-19 pandemic has brought radiologists’ penchant for descriptive terms front-and-center, with frequent references to one feature in particular: ground-glass opacities.

The term refers to the hazy, white-flecked pattern seen on lung CT scans, indicative of increased density. It’s not quite as dense as the “crazy-paving” pattern, which looks like a mosaic or pavers, and less confounding than the “head cheese sign,” a juxtaposition of three or more densities present in the same lung.

Ground-glass opacities aren’t likely to be found in healthy lungs, though, and wouldn’t result from exposures like air pollution or smoking. There are a lot of diseases that can cause ground-glass opacities, but in COVID-19, there’s a distinct distribution, a preference for certain parts of the lung. COVID-related ground-glass opacities also have a very round shape that’s unusual compared with other ground-glass opacities.

Starting with the Project:

Pre-requisites:

We should have an active AWS account to connect our Supervisely account to an instance for training. We should know how to start an AMI Linux Instance there and install the software in it.

Working with Supervisely:

First, we have created the dataset of Chest X-Ray of COVID patients. So here we use a well-known tool Supervisely which will help create Annotated Image. We then need to create a Workspace and a Team.

1. Uploading the dataset of Images:

Then we need to create a Project. Inside the Project, we upload the dataset or images.

2. Annotating all Uploaded Images:

Annotate the Dataset

After creating the project and uploading the images, we need to annotate the images, so that our model knows what exactly to look in the images.

As mentioned, We need to highlight that part which is having light patches. You can spot the difference between the below two chest X-ray.

Difference between Normal and Annotated Dataset

Image after annotation,

If annotation is the whole part of chest,

After Annotating Images

After creating a dataset, we need to choose the neural network. As we are going to use the Mask R CNN algorithm. As you can see the two options in that algorithm i.e Train and Test. so click on Train and choose an Instance from the cloud.

Add Neural Network

3. Performing Data Augmentation:

After annotation, we need to increase the number of images available in our dataset for getting accurate results. For this, we use a DTL code which would perform some necessary changes in our image to create some new versions of it. Some of the techniques we use are: rotating, increasing or decreasing contrast or the brightness of our images to create the new versions.

For this, We need to upload a DTL code shown below:

[
{
“dst”: “$raw”,
“src”: [
“mlops/covid”
],
“action”: “data”,
“settings”: {
“classes_mapping”: “default”
}
},
{
“action”: “flip”,
“src”: [
“$raw”
],
“dst”: “$raw_fliph”,
“settings”: {
“axis”: “vertical”
}
},
{
“dst”: “$data”,
“src”: [
“$raw”,
“$raw_fliph”
],
“action”: “multiply”,
“settings”: {
“multiply”: 5
}
},
{
“action”: “crop”,
“src”: [
“$data”
],
“dst”: “$randocrop”,
“settings”: {
“random_part”: {
“height”: {
“min_percent”: 10,
“max_percent”: 40
},
“width”: {
“min_percent”: 30,
“max_percent”: 80
},
“keep_aspect_ratio”: false
}
}
},
{
“action”: “crop”,
“src”: [
“$data”
],
“dst”: “$randocrop2”,
“settings”: {
“random_part”: {
“height”: {
“min_percent”: 40,
“max_percent”: 90
},
“width”: {
“min_percent”: 60,
“max_percent”: 90
},
“keep_aspect_ratio”: false
}
}
},
{
“action”: “dummy”,
“src”: [
“$raw”,
“$raw_fliph”,
“$randocrop”,
“$randocrop2”
],
“dst”: “$out”,
“settings”: {}
},
{
“dst”: “$precontrast”,
“src”: [
“$out”
],
“action”: “multiply”,
“settings”: {
“multiply”: 5
}
},
{
“dst”: “$outcontrast”,
“src”: [
“$precontrast”
],
“action”: “contrast_brightness”,
“settings”: {
“contrast”: {
“min”: 0.5,
“max”: 2,
“center_grey”: false
},
“brightness”: {
“min”: -50,
“max”: 50
}
}
},
{
“dst”: [
“$totrain”,
“$toval”
],
“src”: [
“$outcontrast”,
“$out”
],
“action”: “if”,
“settings”: {
“condition”: {
“probability”: 0.95
}
}
},
{
“dst”: “$train”,
“src”: [
“$totrain”
],
“action”: “tag”,
“settings”: {
“tag”: “train”,
“action”: “add”
}
},
{
“dst”: “$val”,
“src”: [
“$toval”
],
“action”: “tag”,
“settings”: {
“tag”: “val”,
“action”: “add”
}
},
{
“dst”: “dogs_augmented-train-val”,
“src”: [
“$train”,
“$val”
],
“action”: “supervisely”,
“settings”: {}
}
]

After completing up to this, we will find another folder automatically created which contains at least 4 times the images we originally provided.

5. Connecting to EC2 Instance to train the model:

Now we need to select a Neural Network model from the list for training. In our case, we are going to use the Mask RCNN model.

Add command to your EC2 Instance

Now is the time when we need to create an instance in AWS and connect it with the Supervisely to perform the training operations.

The pre-requisite for an instance by default set by Supervisely includes having a GPU. But since GPU are costly and we have to request AWS for increasing the limit, we will just train our model and download the weight file. After that, we would manually run the weight file in our local machine to view the output.

In AWS, we run an Amazon Linux instance and connect it with our local machine via ssh. After that, we install Docker inside the instance since Supervisely needs Docker as it will automatically download a Docker image of the program which will perform the training.

After we install Docker in the instance, we need to connect Supervisely with the instance, using the highlighted Bash Script.

This will download the Supervisely Docker image in our Instance. All the required dependencies required for training our model are packaged in this Docker image.

After this, from the Neural Networks tab, we start the Training Process of our model.

6. Finding the Output:

After downloading the weights file, we update the Mask RCNN demo code available at Matterplot Repository accordingly to accept this weight file.

After prediction the Resultant Image is,

Here, the right side of the image is predicted COVID positive with an accuracy of 81.2%, label provided in Code.

Conclusion:

Thus by the above process, we were able to perform instance segmentation on COVID Chest X-Rays. Our model confirmed that the X-ray provided was having Ground Glass Opacities, which in turn predicted that the associated person might be infected.

By more proper annotations on the training images, we can increase the accuracy of the model so that it can mask the exact area of the GGOs in the future. Moreover, we can provide a powerful remote instance having GPUs, which can automate the entire process remotely, rather than testing the weights manually.

You can also reach out on my Linkedin, Twitter, Instagram, or Facebook in case you need more help, I would be delighted to solve queries.

If you have come up to this, do drop and 👏 if you liked this article.

Good Luck and Happy Coding.

Stay Home. Stay Safe. Save Lives!

--

--