Detecting Pneumonia in X-ray Images

David Bartholomew
Analytics Vidhya
Published in
8 min readNov 23, 2020
Photo by National Cancer Institute on Unsplash

My primary goal with this article is to highlight a practical application as a result of building a convolutional neural network (CNN) model. In general, CNN models have a wide variety of applications; in this case, it’s building a model that can accurately detect pneumonia in x-ray images. Sounds cool, right? But why would we need a convolutional neural network when we have medical experts that can perform the same task?

Why would we need a CNN to detect pneumonia?

Across the world, there is a general lack of radiologists, and this number continues to diminish which causes significant resources to be spent in order to determine the results of medical imaging. In many cases, a lack of a radiologist delays test results. This could also mean relying on medical professionals that don’t have expertise in radiology, leading to misinterpreted results. Getting accurate results within a short period of time can be a difference-maker and possibly a life-saver for certain patients.

The images used in this particular project were for pediatric patients under 5 years old. According to the World Health Organization, pneumonia accounts for 15% of deaths in the world for children under 5 years old. Pneumonia that is caused by bacteria can be treated with antibiotics, but only one third of children receive them (https://www.who.int/news-room/fact-sheets/detail/pneumonia). Streamlining the process of accurately detecting pneumonia in children is a necessity, and it truly could save lives.

So now that we understand the need for a CNN, let’s look at some details on the project and final model results. Before moving on, if you’re not interested in some of the code used to build the project and just want to see final results (which is understandable if you don’t know Python), then please feel free to scroll to the end of the article.

The Data:
The data for this project was downloaded directly from https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia. The data contains 5,856 x-ray images with a mix of RGB and grayscale images.

Resizing Images:
Images were resized to 75x75 for efficiency when running on a local machine. Additionally, transfer learning was used with a pre-trained network, InceptionResNetV2, which requires that the minimum image size is 75x75.

Train, Validation and Test Sets:
The original train, test, and validation sets were 5,216, 624, and 16 images. All images and labels were combined and then resplit in order to increase the size of the test set to more accurately evaluate model results as 16 images alone would not give a clear enough picture. The final training set was slightly reduced to 5,153 images, the validation set with 632 total images was used in the modeling process to gauge model accuracy and tune the model further, and the test set was used to gauge how the model would handle unseen data with 71 total images.

Data Augmentation:
Data Augmentation was implemented to increase the size of the training set and give the model additional diversity of images to improve accuracy (you may have noticed the resplit training plot above was larger than the original). The initial training set was doubled with images having Pixel values under 25 replaced with 0. Essentially, this converts darker gray areas to black and allows the model to focus on the more important, lighter areas. Below is an original image compared to an altered image along with code used to make this alteration (the differences are very subtle, but effective):

#Change pixel values for data augmentation
i = (X_train >= 0) & (X_train < 25)
altered = np.where(i, 0, X_train)

Additionally, here is a plot of the original vs. augmented training set:

Building the Initial Model:

The first function below was created to visualize a confusion matrix in order to understand the breakdown of true positives, true negatives, false positives and false negatives. The second function below was created for the modeling process. As the documentation states, this function was built to build the neural network model, return classification reports and confusion matrix, and save the best model using a model checkpoint callback based on validation accuracy.

#Build Plot Confusion Matrix Function
def plot_confusion_matrix(cm, classes=[0, 1], normalize=False, title=None, cmap=plt.cm.Blues, ax=None):
"""
Print and plot a confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes)
plt.yticks(np.arange(0, 1), [0, 1])
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
else:
pass
thresh = cm.max() / 2.
j_list = []
for i in cm:
for j in i:
j_list.append(j)
zero = j_list[:2]
one = j_list[2:]
for i, j in enumerate(zero):
plt.text(x=i, y=0, s=j, horizontalalignment="center", fontsize=16,
color="white" if j > thresh else "black")
plt.text(x=0, y=0.2, s='True Negatives', horizontalalignment="center",
fontsize=16,
color="white" if j > thresh else "black")
plt.text(x=1, y=0.2, s='False Positives', horizontalalignment="center",
fontsize=16,
color="white" if j > thresh else "black")
for i, j in enumerate(one):
plt.text(x=i, y=1, s=j, horizontalalignment="center", verticalalignment="center", fontsize=16,
color="white" if j > thresh else "black")
plt.text(x=0, y=1.2, s='False Negatives', horizontalalignment="center",
fontsize=16,
color="white" if j > thresh else "black")
plt.text(x=1, y=1.2, s='True Positives', horizontalalignment="center",
fontsize=16,
color="white" if j > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')

Function to build model:

layers_list = []# Create Model Checkpoint 
mc = ModelCheckpoint('best_model_test.h5', monitor='val_accuracy', mode='max', verbose=1, save_best_only=True)
def build_model(optimizer, epochs, batch_size, callbacks=mc, weights={0:1,1:1}):
"""
Build a neural network model, returning classification reports, confusion matrix,
and save best model using model checkpoint based on val_accuracy.

Input Parameters: optimizer, epochs, batch_size, callbacks, weights
"""

# Initialize a sequential model
model = Sequential()
# Add layers
for i in layers_list:
model.add(i)

# Compile the model
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=['accuracy'])

results = model.fit(X_train, y_train, callbacks=callbacks, class_weight=weights, epochs=epochs, batch_size=batch_size,
validation_data=(X_test, y_test))
build_model.results = results

# Output (probability) predictions for the train and test set
y_hat_train = model.predict(X_train)
y_hat_test = model.predict(X_test)
build_model.y_hat_train = y_hat_train
build_model.y_hat_test = y_hat_test

#Visualize Results
history = results.history
plt.figure()
plt.plot(history['val_loss'])
plt.plot(history['loss'])
plt.legend(['val_loss', 'loss'])
plt.title('Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.show()

plt.figure()
plt.plot(history['val_accuracy'])
plt.plot(history['accuracy'])
plt.legend(['val_accuracy', 'accuracy'])
plt.title('Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()

print('-----------------------------------\n')

# Print the loss and accuracy for the training set
results_train = model.evaluate(X_train, y_train)
print('Train Results', results_train)
print('-----------------------------------\n')

# Print the loss and accuracy for the training set
results_test = model.evaluate(X_test, y_test)
print('Test Results', results_test)
print('-----------------------------------\n')

# Print Classification Reports
print('Train Classification Report')
print(classification_report(y_train, np.round(y_hat_train, 0),
target_names = ['Normal (Class 0)','Pneumonia (Class 1)']))
print('-----------------------------------\n')

print('Test Classification Report')
print(classification_report(y_test, np.round(y_hat_test, 0),
target_names = ['Normal (Class 0)','Pneumonia (Class 1)']))
print('-----------------------------------\n')

# load the saved model
saved_model = load_model('best_model_test.h5')
# evaluate the model
_, train_acc = saved_model.evaluate(X_train, y_train, verbose=0)
_, test_acc = saved_model.evaluate(X_test, y_test, verbose=0)
build_model.saved_model = saved_model
print('Best Model Results\n')
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
print('-----------------------------------\n')

#Create Confusion Matrices
train_cm = confusion_matrix(y_true=y_train, y_pred=np.round(y_hat_train, 0))
test_cm = confusion_matrix(y_true=y_test, y_pred=np.round(y_hat_test, 0))
build_model.train_cm = train_cm
build_model.test_cm = test_cm

#Plot Train Confusion Matrices
plt.figure(figsize=(12, 6))
plt.subplot(121)
plot_confusion_matrix(cm=train_cm,
cmap=plt.cm.Blues)

plt.subplot(122)
plot_confusion_matrix(cm=test_cm,
cmap=plt.cm.Blues)
plt.subplots_adjust(wspace=0.4)

Now that the functions have been set up, here is the structure of the first model:

#Add layers
layers_list = []
layer1 = layers.Conv2D(75, (2, 2), padding='same', activation='relu', input_shape=(75, 75, 3))
layer2 = layers.MaxPooling2D((2, 2), padding='same')
layer3 = layers.Conv2D(75, (2, 2), padding='same', activation='relu')
layer4 = layers.MaxPooling2D((2, 2), padding='same')
layer5 = layers.Conv2D(75, (2, 2), padding='same', activation='relu')
layer6 = layers.MaxPooling2D((2, 2), padding='same')
layer7 = layers.Flatten()
layer8 = layers.Dense(75, activation='relu')
layer9 = layers.Dense(1, activation='sigmoid')
layers_list = [layer1, layer2, layer3, layer4, layer5, layer6, layer7, layer8, layer9]#Utilize Stochastic Gradient Descent Optimizer
opt = keras.optimizers.SGD(learning_rate=0.01, momentum=.9)
#Build model with pre-built function
build_model(optimizer=opt, epochs=50, batch_size=100, callbacks=mc)

This model performed well with 96.7% accuracy for the validation set. I tried a second model without transfer learning in an attempt to improve these results but to no avail. Truthfully, I tried what felt like millions of different parameters and hyperparameters and did achieve better results in some cases, but the best model results overall were determined with the prebuilt InceptionResNetV2 model. I won’t print the entire model structure here as it’s quite large, but if you’re interested, you can view the structure in Python with this code (and in my GitHub repo which is included at the end of this article):

#Import InceptionResNetV2
from keras.applications import InceptionResNetV2
#Build the model base with required input shape 75x75x3
cnn_base = InceptionResNetV2(weights='imagenet',
include_top=False,
input_shape=(75, 75, 3))
#View base structure
cnn_base.summary()

Typically, most would freeze the pre-trained network or at least part of the network to use the prebuilt model weights and reduce training time. I decided to be different and retrain the entire model to improve accuracy. This was possible to run on my Mac with the size of the dataset and using a smaller image size of 75x75x3. I included the base model as my first layer as shown below:

#Set random seed
np.random.seed(123)
#Add layers including InceptionResNetV2 base
layers_list = []
layer1 = cnn_base
layer2 = layers.Flatten()
layer3 = layers.Dense(75, activation='relu')
layer4 = layers.Dense(1, activation='sigmoid')
layers_list = [layer1, layer2, layer3, layer4]#Utilize Stochastic Gradient Descent Optimizer
opt = keras.optimizers.SGD(learning_rate=0.01, momentum=.9)
#Build model with pre-built function
build_model(optimizer=opt, epochs=50, batch_size=100, callbacks=mc)

Final Model Results:

The function for making predictions on our unseen test set is below. The test set was held out until the final prediction in order to remove any bias that may be implemented when training the model.

#Build a function to make predictions on unseen data
def predict_new_images(test_img, test_lbls):
'''Predict saved model results on unseen test set, print classification report and plot confusion matrix.'''

#Transpose val labels
test_lbls = test_lbls.T[[1]]
test_lbls = test_lbls.T

#Standardize the data
test_final = test_img/255

predictions = build_model.saved_model.predict(test_final)
predict_new_images.predictions = predictions
test_cm = confusion_matrix(y_true=test_lbls, y_pred=np.round(predictions, 0))

print('Classification Report')
print(classification_report(test_lbls, np.round(predictions, 0),
target_names = ['Normal (Class 0)','Pneumonia (Class 1)']))
print('-----------------------------------\n')

plt.figure(figsize=(10, 6))
plot_confusion_matrix(cm=test_cm,
cmap=plt.cm.Blues)
plt.savefig('images/final_model_result.png')

The function plots a confusion matrix showing the final results:

In some ways I’m a perfectionist, so having a result of 100% accuracy on the unseen test set made me happy. Personally, I find it amazing that out of 71 x-ray images in the unseen test set, this convolutional neural network model is able to detect whether or not a pediatric patient has pneumonia with 100% accuracy. Like I said, it is pretty cool.

For more details and the full notebook for this project, please visit my GitHub repo here: https://github.com/dbarth411/dsc-mod-4-project-v2-1-online-ds-sp-000.

--

--

David Bartholomew
Analytics Vidhya

Revenue Management professional in the hospitality industry with an interest in Data Science & Machine Learning.