Preamble

Pneumonia and Diarrhea are perhaps the two biggest contributors in juvenile’s mortality across the world.

Measuring pneumonia burden is difficult since there is wide variation in its presentation particularly in children and the multiple etiological agents associated with the disease. It is also well known that clinical signs of malaria and measles overlap with those of pneumonia and there is lack of clinical signs in malnourished children leading to misclassification error.

As per Lancet report, in 2015, India, Nigeria, Indonesia, Pakistan, and China contributed to more than 54% of all global pneumonia cases, with 32% of the global burden from India alone. This makes India sharing the highest burden of pneumonia in the world.

In this article, I will be using CNN to train on a number of chest X-rays and predict whether pneumonia is present or not. This problem is different than what we saw in part 1. In this article, we will be dealing with gray-scale chest X-ray images while in part 1, we had to train CNN on colored blood cell images. The problem also becomes a bit challenging as pneumonia usually is represented as opacity in the lobes of lungs.

Data Source

Data can be found here

The dataset is organized into 3 folders (train, test, val) and contains sub-folders for each image category (Pneumonia/Normal). There are 5,863 X-Ray images (JPEG) and 2 categories (Pneumonia/Normal).

Chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All chest X-ray imaging was performed as part of patients’ routine clinical care.

For the analysis of chest x-ray images, all chest radio-graphs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then graded by two expert physicians before being cleared for training the AI system. In order to account for any grading errors, the evaluation set was also checked by a third expert.

Illustrative Examples of Chest X-Rays in Patients with Pneumonia; The normal chest X-ray (left panel) depicts clear lungs without any areas of abnormal opacification in the image. Bacterial pneumonia (middle) typically exhibits a focal lobar consolidation, in this case in the right upper lobe (white arrows), whereas viral pneumonia (right) manifests with a more diffuse ‘‘interstitial’’ pattern in both lungs

There are many pathogens for pneumonia but bacterial pneumonia and viral pneumonia are more prevalent. The main difference between these two are visible in chest X-rays:

Viral pneumonia — more diffused opacity in both lungs while in Bacterial pneumonia — the opacity is more prominent in one part of the lungs

CNN workings (fast.ai)

Lets first load the libraries:

import os
print(os.listdir("../input"))
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai import *
from fastai.vision import *
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Path of our dataset:

path = Path("../input/chest_xray/chest_xray/")

Images in Train and Test folders

  1. Train folder: (3875 images in pneumonia folder & 1341 images in normal folder )
  2. Test folder: (390 images in pneumonia folder & 234 images in normal folder)
fnames = get_image_files(path/'train'/'PNEUMONIA')
fnames_train_pneumonia = np.array(fnames)
fnames_train_pneumonia.shape
(3875,)fnames = get_image_files(path/'train'/'NORMAL')
np.array(fnames).shape
(1341,)fnames = get_image_files(path/'test'/'NORMAL')
print(np.array(fnames).shape)
(234,)fnames = get_image_files(path/'test'/'PNEUMONIA')
print(np.array(fnames).shape)
(390,)

Now, lets create the Image Data Bunch:

np.random.seed(42)
tfms = get_transforms(do_flip=False)

data = ImageDataBunch.from_folder(path, train="train", valid_pct=0.20, ds_tfms = tfms, classes = ['PNEUMONIA', 'NORMAL'], bs=64, size=224).normalize(imagenet_stats)

After we create the Image Data Bunch, its nice to see few images in the dataset:

data.show_batch(3, figsize=(12,12))
just by looking at the images, its difficult to tell which one is the pneumonia case and which one is normal

Time to train the model; we will again use ResNet50 which is trained on ImageNet dataset and train the model on our training images with “fit_one_cycle” method:

learn = cnn_learner(data, models.resnet50, metrics=accuracy, model_dir = "/temp/model/")learn.fit_one_cycle(4)

This is the result of our model:

around 94% accuracy

After training our model, its better to see the images where model is the most confused or has made the maximum losses:

interp = ClassificationInterpretation.from_learner(learn)
losses, idxs = interp.top_losses()
len(data.valid_ds) == len(losses) == len(idxs)
interp.plot_top_losses(9, figsize=(12,12))

This is how the confusion matrix looking like:

67 images wrongly classified

Lets now find the optimal learning rate:

learn.lr_find()
we can use the range of learning rates as 1e-6 and 1e-4

After finding the best range of learning rates, we will unfreeze the model and train again on training images with “fit_one_cycle” method with 5 epochs:

learn.unfreeze()
learn.fit_one_cycle(5, max_lr=slice(1e-6, 1e-4))

Here is the result:

Prediction accuracy is 95.98%

And the new and improved confusion matrix will look like this:

47 images wrongly classified so prediction accuracy improved

Below are few images in the validation set where model has predicted accurately:

learn.show_results()
Model’s prediction results

With just 5,000 images to train on, we have seen that model’s prediction accuracy is around 96%. This is simply amazing.

Deep Learning has a huge prospects in the area of medical science and I’m more than sure that AI is going to play an extremely important and pivotal role in the future of medical science.

In next and last part of this 3 part series, I will be dealing with Cancer Images dataset. So, do stay tuned.

Kaggle Kernel of this working is saved at:

--

--