Detection and Semantic Segmentation of Pneumothorax Disease from X-Ray Images using Deep Learning

Build a binary image classification model to detect if the image contains pneumothorax. If yes, then pass it through a semantic segmentation model to identify and mark the affected part.


Table of Contents:
1. Introduction
2. Types of Pneumothorax Disease
3. Symptoms
4. Diagnosis
5. Business Problem
6. DL Formulation
7. Business Constraints
8. Dataset Column Analysis
9. Performance metric
10. Exploratory Data Analysis
11. Existing Approaches and Improvements in my model
12. Data Preprocessing
13. Deep Learning Models
14. Final Data Pipeline
15. Error Analysis
16. Future work
17. LinkedIn and GitHub Repository
18. Reference

1. Introduction:

What is Pneumothorax disease?
A pneumothorax is when the lung has collapsed due to air entering the space around your lungs (known as the pleural space). In a healthy body, the lungs are touching the walls of the chest. Air can enter the pleural space through an opening in your chest wall or in the lung. Air in the pleural space creates an increase in pressure around the lung and causes it to collapse. The lung may fully collapse, but most often only a part of it collapses. This collapse can also put pressure on the heart, causing further symptoms.

A pneumothorax can be severe, depending on how much air is trapped in the pleural space. A small amount of trapped air can usually resolve by itself, provided there are no other complications. Larger amounts of trapped air can be serious and lead to death if medical treatment is not obtained.

2. Types of Pneumothorax Disease

There are mainly 4 types of pneumothorax observed.

a) Primary spontaneous:
Primary spontaneous pneumothorax(PSP) occurs in young people (aged 15–34) without any history of lung disease. The direct cause of PSP is unknown. People at risk include smokers, tall men, and those who have had a family member with a pneumothorax.

2. Secondary spontaneous:
Secondary spontaneous pneumothorax (SSP) can be caused by a variety of lung diseases(such as chronic obstructive pulmonary disease, cystic fibrosis, tuberculosis, pneumonia, lung cancer, sarcoidosis, pulmonary fibrosis or cystic lung diseases) and tissue disorders(such as Marfan’s Syndrome). SSP carries more serious symptoms than PSP, and it is more likely to cause death.

3. Traumatic pneumothorax:
A traumatic pneumothorax is the result of an impact or injury. Potential causes include blunt trauma or an injury that damages the chest wall and pleural space. One of the most common ways this occurs is when someone fractures a rib. The sharp points of the broken bone can puncture the chest wall and damage lung tissue. Other causes include sports injuries, car accidents, and puncture or stab wounds.

4. Tension pneumothorax:
Tension pneumothorax is caused by a leak in the pleural space that resembles a one-way valve. As a person inhales, the air leaks into the pleural space and becomes trapped. It cannot be released during an exhale. This process leads to increased air pressure in the pleural space that is life-threatening and needs immediate treatment.

3. Symptoms:

a) Shortness of breath
b) Abnormally fast heart rate (known as tachycardia)
c) Chest pain, which may be more severe on one side of the chest
d) Sharp pain when inhaling
e) Blue discoloration of the skin or lips
f) Cold sweats
Some cases of pneumothorax have almost no symptoms. These can only be diagnosed with an X-ray or another type of scan.

4. Diagnosis:

Diagnosis of a pneumothorax is typically done via a chest X-ray, which takes images to detect the presence of air in the pleural space (the area around the lungs). A CT scan and thoracic ultrasound can also be used to help diagnose a pneumothorax.
A doctor (radiologist) then examines the X-ray report to diagnose pneumothorax and the affected area.

5. Business Problem:

Pneumothorax is typically detected on chest X-ray, examined by a doctor or a radiologist. But this requires manual effort. Since current imaging volumes are large in number, it takes a long time to review every image and prepare a report. Pneumothorax can cause life-threatening emergency due to lung collapse and respiratory or circulatory distress if it is not detected early.

Our objective is to build an automated method to predict X-rays with pneumothorax and segmentize the affected area. This will help to prioritize the treatment of patients with pneumothorax. Automatic image segmentation can assist doctors in the treatment and diagnosis of diseases with higher accuracy, accelerate the diagnosis process, and improve efficiency.

6. DL Formulation:

As we have images with and without pneumothorax, we will build a classification model first then a semantic segmentation model.
a) Classification Model:
Build a binary image classification model first to predict if the image contains pneumothorax. I will use given images and class labels to train my classification model.
b) Semantic Segmentation Model:
Our task is to predict the mask affected by pneumothorax. This is a semantic segmentation problem. If the classification model predicts a positive result, then pass the image through another semantic segmentation model to mark the pneumothorax affected area. I will use the given images and RLE masks to train my model.

What is image segmentation?
Image segmentation is a task where we classify pixel values of images belonging to a particular object class. So based on the way of classifying these pixels there are broadly two types of Segmentation.
I) Semantic segmentation and II) Instance segmentation.

I) Semantic Segmentation:
In semantic segmentation, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink).

II) Instance Segmentation:
In instance segmentation also every pixel belongs to a particular class. However, different objects of the same class have different colors i.e different class labels (Person 1 as red, Person 2 as green, background as black, etc.).

As mentioned earlier, our problem is a semantic segmentation problem where we have to predict every pixel either mask or background.

7. Business Constraints:

a) There are no such latency constraints for this problem but the model should predict within few minutes.
b) Along with class label prediction, the model should segmentize the pneumothorax affected area.

8. Dataset Column Analysis:

Source of Data: The dataset is given on Kaggle's website. Please find the link below.

Given Dataset

The given data consists of ImageId and EncodedPixels. For every ImageId, we have an image in DICOM format. EncodedPixels with ‘-1’ value indicates the image is without pneumothorax. Images with pneumothorax have masks in run-length-encoded (RLE) format. We have to decode and create the mask.

9. Performance metric:

a) Classification Model:
I will measure the performance of the classification model based on “Recall”. In the pneumothorax detection problem, it is okay to predict a negative pneumothorax as positive because when it will be passed to the next segmentation model, most probably it will predict a blank mask. But, if a positive case is detected as negative it will not even pass through the segmentation model and the patient will suffer.
As this is a binary classification problem, I will take “binary_crossentropy” as the loss function.

b) Segmentation Model:
I have given the images along with masks. I have to train a model using that data and predict masks for the test data. So, this is a Semantic Image Segmentation problem.
In this Semantic Image Segmentation problem, I am going to measure the performance of the model based on the “IOU score”. I will use a combination of “binary_crossentropy” and “dice_loss” as loss functions. These terms are explained below.

I. Intersection over Union (IoU) Score:
The Intersection over Union (IoU) metric, also referred to as the Jaccard index. This is a method to quantify the percent overlap between the target mask and our prediction output. This metric is closely related to the Dice coefficient.
The IoU metric measures the number of pixels common between the target and prediction masks divided by the total number of pixels present across both masks.

II. Pixel-wise cross-entropy loss:
This loss examines each pixel individually, comparing the class predictions (depth-wise pixel vector) to our one-hot encoded target vector.
Pixel-wise loss is calculated as the log loss summed over all possible classes.

III. Dice loss:
Dice Loss = 1-Dice Coefficient

Where Dice Coefficient(D) =

Here, pi = predicted pixel values.
gi = groung truth pixel values.
In the image segmentation scenario, the values of pi and gi are either 0 or 1.
1 → pixel is a boundary
0 → pixel is not a boundary
In the dice coefficient,
Numerator → 2 * Sum of correctly predicted boundary pixels. (when pi and gi both are 1)
Denominator → Sum of total boundary pixels of both predicted and ground truth.

10. Exploratory Data Analysis:

First, perform some basic data cleaning operations on the given data.

Out of 12954 image-ids, 12047 are unique. It means there are duplicates. So, I have to remove the duplicate image-ids.

The images are given in DICOM format. We have to extract the information from the metadata for EDA. The metadata of a sample image is printed below:

Extract some information from metadata: Extract age, sex, modality, body part, and view position of every image given. I will use this data for EDA.

a) Distributions of Class Labels:

If the RLE mask field for an image is “-1”, then this is negative pneumothorax otherwise positive.

This is an imbalanced dataset. Among all the x-ray images 77.85% are without Pneumothorax and 22.15% are with pneumothorax.

b) Distribution of Gender:

There are 55% male patients and 45% female patients in the given dataset.

c) Distribution of Gender along with Class Label:

Among all the male patients 77.53% are without pneumothorax and 22.47% are with pneumothorax and among all the female patients 78.23% are without pneumothorax and 21.77% are with pneumothorax. Pneumothorax distribution is almost similar for both male and female patients.

d) Distribution of View Position:

Posteroanterior view (PA):
The x-ray source is positioned so that the x-ray beam enters through the posterior (back) aspect of the chest and exits out of the anterior (front) aspect, where the beam is detected.
Anteroposterior view (AP):
The x-ray source and detector are reversed: the x-ray beam enters through the anterior aspect and exits through the posterior aspect of the chest. AP chest x-rays are harder to read than PA x-rays and are therefore generally reserved for situations where it is difficult for the patient to get an ordinary chest x-ray, such as when the patient is bedridden.

In the dataset for 60.38% of images view position is PA and 39.62% of images view position is AP.

e) Distributions of Patient’s Age for different Class Labels:

a) Patients within 0–6 years and 90–100 years are not suffering from Pneumothorax.
b) For patients with 16 years of age, pneumothorax count is more than without pneumothorax.

11. Existing Approaches and Improvements in my model:

In the existing approaches, all the images and their corresponding masks(if the mask is not available then a blank mask is passed) are used directly in the segmentation model to train the model. For prediction, the x-ray image is feed into the model to get the predicted mask. If the image does not contain pneumothorax it will show a blank mask otherwise it will mark and display the affected area.

As 78% of the given images does not contain pneumothorax, I have split the solution into 2 parts:
a) Classification: Firstly, I will build a binary image classification model using pre-trained models and transfer learning to classify the image into positive pneumothorax or negative pneumothorax. I will use images along with their class labels to train the classification model.
b) Segmentation: I will build an image segmentation model to segment the pneumothorax affected area. I will be using only the images which contain pneumothorax and their corresponding masks to train the segmentation model. I will use UNET architecture with the DenseNet121 as an encoder for image segmentation. If the classification model predicts positive pneumothorax, then the image will be passed through the segmentation model for mask prediction.

12. Data Preprocessing:

a) Decode the images given in DICOM format: The images given in the dataset are in dicom format. We cannot use them directly in our model. I have to decode the images to be suitable for our model.

b) Convert RLE to png mask:
The masks are given in Run Length Encoded (RLE) format. We have to covert RLE to png mask. The below function given by organizers converts RLE to mask.

What is Run Length Encoding?
Run-length encoding (RLE)
is a very simple form of data compression in which a stream of data is given as the input (i.e. “AAABBCCCC”) and the output is a sequence of counts of consecutive data values in a row (i.e. “3A2B4C”). This type of data compression is lossless, meaning that when decompressed, all of the original data will be recovered when decoded. Its simplicity in both encoding (compression) and decoding (decompression) is one of the most attractive features of the algorithm.
In the RLE encoded format, the odd position indicates the number of occurrences and the even position right to it indicates the value.

13. Deep Learning Models:

a) Classification Model: Firstly, I have to build a data pipeline for the classification model using the decoded images and their corresponding labels. Below is the code snippet for the data pipeline for the classification model.

Now, I will create my classification model using VGG19 architecture with pre-trained imagenet weights. I will set all the layers of the VGG19 model “trainable=False”. I tried VGG16 as well but received a better recall value using VGG16.

Now, I will compile and train this model and save the best one using checkpoints.

Graphs obtained from tensorboard are displayed below.

As we can see from the graph, the best validation recall received at epoch 7. Weights are saved for this using model checkpoints.

As I said earlier, I tried VGG16 as well. Below is the comparison for both models.

From the above table, we can see that vgg19 gives better recall compared to vgg16.

Now define a function to predict the class label for validation data using this classification model.

Function to predict class label and plot confusion matrix

Now we have to check for which threshold value ranging from 0.1 to 0.9 gives the best prediction.

From the output of the above code snippet, I found that threshold=0.3 gives the best result in terms of all the parameters. The confusion matrix for this threshold value is plotted below.

Below is the distribution of predicted probabilities for different class labels from the classification model.

It is observed that there is a large overlap between the probability scores of positive and negative class labels.

b) Semantic Segmentation Model: Semantic Segmentation model is built on the positive pneumothorax data only along with their corresponding masks. Similar to the classification model, I have to build the data pipeline for the segmentation model as well. Below is the code snippet.

I have used UNET architecture for this semantic segmentation task. I replaced the encoder part of the UNET model with pre-trained DenseNet121 backbone with imagenet weights and kept the same decoder part. Below is the code snippet.

Define callbacks and compile and train the model. Also, save the best model using checkpoints.

Below are the graphs of IOU Score and Loss received from tensorboard.

Best validation score=0.3066 received at epoch 17. Saved the weights using checkpoints for future use.

Displaying some of the images and their corresponding original and predicted masks using the above model.

Below is the distribution of iou score. for all the masks predicted.

a) There are around 200 images whose iou score is less than 0.1.
b) We need to train the model with more similar images for which iou score is very low, so that model can learn better.

14. Final Data Pipeline:

In the final data pipeline, we give the image path of the x-ray image as input. this function takes care of all the data preprocessing stuff and displays the image along with the predicted mask.

Final Pipeline for Mask Prediction

Below are some sample predictions received from the final pipeline given an image path as input.

When the classification model gives negative result, it displays only image with title “THIS IMAGE DOES NOT CONTAIN PNEUMOTHORAX”.

When the classification model gives positive result, the image is passed through segmentation model and the segmentation model predicts the mask. The image along with the mask is displayed with title “THIS IMAGE CONTAINS PNEUMOTHORAX”.

15. Error Analysis:

Let’s do some error analysis on both the classification and segmentation model. So that we can improve the performance in future models.

a) Classification Model:
Find out the false-negative points and display a few of them.

False Negative

1. False-negative points whose probability scores are very low(near zero) are classified completely wrongly. To fix this issue we need to oversample these data so that the model can learn from these similar images.
2. False-negative points whose probability score is less than the threshold but has a little higher value(near about threshold value), even though these are classified wrongly this can be fixed by training our model furthermore.

Find out the false positive points and display a few of them.

1. False-positive points whose probability score is higher(near one) are classified completely wrongly. We need to oversample these data and train our model to get a better result.
2. False-positive points whose probability score is lower(near the threshold value) but higher than the threshold, even though these points are classified wrongly this can be fixed by training our model further.

b) Segmentation Model:
Firstly, store the iou score for the predicted masks corresponding to every image. Then, sort the dataframe based on the iou score in descending order. Display a few images along with their original and predicted masks from the top of the dataframe i.e. with best iou scores.

Display a few images along with their original and predicted masks from the bottom of the dataframe i.e. with a very low iou score.

Conclusion: Images for which iou score is less than 0.1 are generating very poor results.

16. Future Work:

  1. Due to the unavailability of good computation resources, I could not train my model for more epochs. If it is trained for more epochs, we will get a better prediction.
  2. By doing error analysis, we have filtered the images for which we got false negative and false positive predictions for the classification model. If we oversample these images so that the model can learn more, we may get better results.
  3. For the segmentation model also, I have filtered out the images with very low iou score. If oversampling is done for these images and retrain the model, we may get better results.

