Classification of COVID-19 X-ray images with Keras and its potential problem

Yiwen Lai
Analytics Vidhya
Published in
8 min readApr 12, 2020
Photo by Robert Zunikoff on Unsplash

Disclaimer: The methods and techniques explored in this post are meant for educational and sharing purpose. It is not a scientific study nor it will be published in a journal.

This project was first inspired by a post from Adrian Rosebrock, using X-ray images to build a detector to classify COVID-19 patients. As WHO Director-General has stressed to all nations to do testing since we need to identified infected people in order to reduce the spread. So I thought if it is possible to classify them with high accuracy and data science community could build a stop-gap solution for nations who have difficulty getting the test kit.

In late January, a Chinese team publish a paper detail clinical and preclinical features of COVID-19. They reported that using patients CT chest scan to classify COVID-19, Influenza or healthy patients. And their deep learning model is capable of achieving 86.7 % accuracy.

WHO stressing all countries to test for COVID-19

Summary

The following are the steps used in common image classification.

  1. Gathering of dataset
  2. Create an image generator
  3. Defining our model
  4. Training our model
  5. Evaluation of our model

1. Gathering of dataset

In this project, I used the dataset from covid-chestxray-dataset and extract only COVID-19 X-ray images. In addition, I use the dataset from Chest X-ray images from Kaggle and extract Normal and Bacterial images

The reason why I have chosen 3 different categories is that with only COVID-19 images, I afraid the model will learn by counting white shaded pixels. Because these infected case will have white patches in their lungs. But with 3 categories, the model will be forced to learn patterns between bacterial and virus cases. Also, take note that I have balance the dataset by selecting the same number of images for each class. This is really important to ensure our model is not bias on either of the classes. *COVID-19 is a viral infection.

The dataset consist of the following images:

Total covid_images: 179
Total normal_images: 179
Total bacteria_images: 179

2. Create an image generator

Augmentation

Since the number of images in the dataset is limited, I used augmentation to increase the variations for the model when training. One of the augmentations I want to highlight is the use of gamma contrast to change the intensity of the white pixel in the image and random cropping of images. These methods are used to prevent the model from learning just counting the white shaded pixels. And also our augmentation should not be done blindly but selectively to tackle a certain problem that we face. I use the following imgaug package for this project.

I would like to share that we should always generate a few images from the generator for viewing before we start training our model. This is to ensure our generator produces expected augmentation result. As it will save us precious time when our model is not working and we are sure that our generator is not part of the problem.

Start small, iterate fast

Another advice is that at the start we should try to keep our training images as small as possible. Yes, it would probably affect our model capability to learn from higher resolution image. But a smaller image will allow us to iterate faster through an experiment with different models and hyperparameters. When these parameters are found, we can then scale up the image for further training. This way we will save hours during our exploration.

The following are some examples from our generator.

3. Defining our model

A slight modification of https://www.researchgate.net/figure/TOP-LEVEL-DIAGRAM-OF-TRANSFER-LEARNING-FROM-A-PRE-TRAINED-CNN-MODEL_fig4_333882146

Transfer learning

Transfer learning is a method of utilizing knowledge of an existing model to learn another task. This knowledge in the model will act as initialise weights for our new task. There are benefits of using transfer learning, the model is able to get higher accuracy with lesser data and faster convergence than training from scratch.

Fine-Tuning model

Another common practice call fine-tuning is to freeze weights of the first few layers on the pre-trained network and train only the last few layers. This is because the first few layers capture general features like curves and edges that are useful on a classification problem. Usually, we will keep these general features and focus training on more specific features on our problem. (In example shape of bones, lungs)

Project proposal network, fine-tune of VGG16

But for this project, it was found that fine-tune the first few layers produce a better result. My guess on this is that ImageNet pre-trained network does not capture much of X-ray’s curves and edges. Because X-rays images are supposed to be private information, ImageNet is built on images found in the public domain. And also ImageNet is trained on coloured images and our dataset is in grayscale. You might ask then ImageNet weights is useless for X-ray image classification but I would argue that with ImageNet we are better off than using random weights.

The following is the code to create our classification model.

# VGG16 transfer learning
def create_model(input_shape, n_out):
model = Sequential()
model.name = “VGG16_Model”
pretrain_model = VGG16(include_top=False, weights=’imagenet’, input_tensor=Input(shape=input_shape))

# Set all layers to be trainable
for layer in pretrain_model.layers:
layer.trainable = True
for layer in pretrain_model.layers[-4:]: # last 4 layer freeze
layer.trainable = False
x = pretrain_model.output
x = AveragePooling2D(pool_size=(3,3))(x)
x = Flatten()(x)
x = Dense(64, activation=’relu’)(x)
x = Dropout(0.5)(x)
x = Dense(n_out, activation=”softmax”)(x)
model = Model(pretrain_model.input, x)
return model
Total params: 14,747,715
Trainable params: 7,668,291
Non-trainable params: 7,079,424

4. Training our model

To train our model “Adam” optimizer is used and “categorical_crossentropy” as the loss function. I also reduce the learning rate on a plateau and early stop to prevent overfitting. For full detail on training please refer to the notebook provided.

I would recommend using FastAI implementation on find learning rate and fit one cycle policy to train the model but that will be another post for another time. The following link provides a very good explanation on fit one cycle policy.

5. Evaluation of our model

To measure the performance of our model we can look at the f1 scores for each of the class. We can see that our model is able to differentiate between viral and bacteria infection and our accuracy is at 97%, even though our image is in 100x100 resolution.

               precision    recall  f1-score   support

Normal 0.97 0.95 0.96 60
COVID-19 0.98 0.98 0.98 60
Bacteria 0.95 0.97 0.96 60

accuracy 0.97 180
macro avg 0.97 0.97 0.97 180
weighted avg 0.97 0.97 0.97 180

Gradient-weighted Class Activation Mapping (Grad-CAM)

But understanding model performance is not good enough, we need to know what the model had learned and what the model actually see to determine the classification. This is when Grad-CAM comes into the picture. Grad-CAM is a technique for producing “visual explanations ” on a decision from a CNN-based model, making the model transparent. It uses gradient information flowing through the last convolution layer on the decision interest. I use Keras-Vis package to implementation Grad-CAM for this project.

Do take note when implementing Grad-CAM, you need to change ur final layer activation to linear. If not you will not get anything results from your gradient.

layer_idx = utils.find_layer_idx(model, model.layers[-1].name)
model.layers[layer_idx].activation = linear
model = utils.apply_modifications(model)
Lungs anatomy
Sample 1, Grad-CAM COVID-19

Disclaimer: The following explanation is plainly looking at what is given by Grad-CAM and some Googling on Pneumonia infection. I am not a doctor nor I can verify if the model is correct.

From the above sample 1, we can see our model is picking up “Trachea (windpipe)” area to classify COVID-19. We know that COVID-19 is a respiratory virus, maybe this is why the model is focusing the upper part of the lungs and windpipe for classification.

Sample 2, Grad-CAM random samples

From sample 2, we can see that the bacterial infection area of focus is different. Model focus on the central part of the lungs, although bacterial infection can be air born, there are other ways to get infected. Source of infection can be from somewhere else in the body such as the kidney. Bacteria can enter the bloodstream from any source and be deposited in the lungs.

Addition Notes

I’m no doctor, but I know what I am doing. Trust me =D

After all the experiments and read up, in my opinion, I think that X-ray images would likely not enough to identify infected patients and stop COVID-19 transmission. Reason being these X-ray images I collected are most likely taken when the patient is admitted to the hospital and is having breathing difficulty in the first place. The model would not be able to identify an asymptomatic person. We would still need to rely on the test kit for identifying these cases. For now, our best bet is to stay at home to prevent transmission of any means. So please do wear a mask if you really need to go out and wash your hands as soon as you return home. Stay safe.

--

--

Yiwen Lai
Analytics Vidhya

🤖 AI² | NTU Computer Science Graduate | NUS M.Tech Knowledge Engineering | https://twitter.com/Niel_Lai