How I Created a Model That Identifies Cardiac Conditions
Using the NIH Chest X-Ray Dataset with Tensorflow for my First AI Project
For context and some background information about Artificial Intelligence in diagnostics, see the first article in this series, Using AI in Diagnostic Imaging.
In October, I decided to learn more about AI and its applications in medicine. As someone who works in a veterinary clinic themselves and is familiar with the diagnosis process for patients, I am painfully aware of some of the issues limiting early diagnosis and, therefore, treatment of our patients. The biggest one I’ve observed? Money. Clients are often simply not willing or able to spend large sums on preventative testing, especially when their pets are still young. And while some of the costs associated with diagnostic screening are currently unavoidable, such as those related to creating images used (X-ray and ultrasound images, cell slides, etc.), further costs, including those required to have a specialist analyze the image, could possibly be reduced if image recognition was to be more widely incorporated into the veterinary field. With this hope, I decided to learn how to apply AI to diagnostic imaging myself.
Due to similarities between some cardiac conditions in humans and animals (such as cardiomegaly, which is seen very often in dogs with heart failure), I decided to use the NIH Chest X-Rays Dataset for my first AI project. I opted to use the Random Sample version of the dataset, posted on Kaggle by the National Institute of Health, which consists of 5,606 images, 5% of the original number. As I had not previously worked with AI, I used a part of Omar Salah Hemied’s code with the same dataset to read, prepare, and split the data. I then used InceptionV3, a pre-created image recognition model, as the base for the model. I wrote the code in Python and ran it in a Google Colab notebook.
Steps for Creating the Model
Imports and Setup
First, I mounted my Google Drive, where I had uploaded the dataset and CSV file, the document that contained all labels associated with each data point. In this case, the data points were the images from the NIH Chest X-Ray Dataset.
from google.colab import drivedrive.mount('/content/drive')
Next, I imported TensorFlow to my notebook in Google Colab. TensorFlow is an open-source Python library, essentially a platform containing different sub-libraries (such as Keras) and functions that can be used in Machine Learning to create and use models. I used version 2.7.0.
import tensorflow as tf
Then, I imported the modules necessary for the creation and preparation of the model. This included InceptionV3, which can be imported from TensorFlow. I also imported components from — and sometimes entire — libraries. The libraries and modules that I imported to Colab included OS, CV2 (used for reading images), NumPy, Pandas, tqdm, Matplotlib, and scikit-learn.
from tensorflow.keras.models import Modelfrom tensorflow.keras.applications.inception_v3 import InceptionV3from tensorflow.keras.applications.inception_v3 import preprocess_inputimport osimport cv2import numpy as npimport pandas as pdfrom tqdm import tqdmimport matplotlib.pyplot as pltfrom random import shuffle , seedfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import MultiLabelBinarizerfrom tensorflow.keras.applications import EfficientNetB3from tensorflow.keras.applications.densenet import DenseNet169from tensorflow.keras.preprocessing.image import ImageDataGeneratorfrom tensorflow.keras.layers import Input, concatenate, Dense, Flatten, Activation, Conv2D, Dropout, MaxPooling2D, GlobalAveragePooling2D, BatchNormalization
There are some modules that ultimately were not used, but that could have been helpful to add layers to the model.
Reading and Preparing the CSV Data
After completing the necessary imports, the information from the CSV file had to be prepared. The first step for this process was to read the CSV file, which I accessed by copying its path in my Drive and pasting it into the “read_csv” function from Pandas.
data = pd.read_csv('/content/drive/MyDrive/.../NIHChestXRaysSample/sample_labels.csv')data.head()
Upon reading the file, I could print and see the first 5 rows (the “head” of the data):
For this project, I was only interested in the “Finding Labels” values associated with each image. To see some of the categorizations of the data and the number of images associated with them, I ran the following code:
Labels_before=data.loc[:,'Finding Labels']Labels_before.value_counts()
Which gave:
Because some images had multiple labels, I then split any labels that were separated by the character “|”, and printed them:
Labels_after = []for i in range(len(Labels_before)): split_labels=Labels_before[i].split("|") if len(split_labels)==1: Labels_after.append(split_labels) else: lab=[] for j in range(len(split_labels)): lab.append(split_labels[j]) Labels_after.append(lab)Labels_after
Which gave:
Reading and Connecting the Image Data
Next, I had to connect the actual images to the data and labels from the CSV file. I did this by connecting the path to the Drive folder with the images to a variable named “image_file_path”, then associating each image in this folder with its corresponding CSV data based on image index through a “for” loop. In this loop, I also resized the image to (224, 244) after setting 224 to a variable “image_size”. Lastly, I appended each image to a list “scans”.
image_size=224image_file_path="/content/drive/MyDrive/.../NIHChestXRaysSample/images/"Labels=[]scans=[]for i in tqdm(range(len(data["Image Index"]))): image=cv2.imread(image_file_path+data["Image Index"][i]) if len(image.shape)>2: resize_image=cv2.resize(image,(image_size,image_size)) scans.append(resize_image[:,:,:4]) else : image=np.dstack([image] * 3) resize_image=cv2.resize(image,(image_size,image_size)) scans.append(resize_image)
I used the following code to check the shape of the images in “scans”:
print(set([x.shape for x in scans]))
Which printed:
I also verified that there was an equal number of labels and scans after transforming each:
len(Labels_after), len(scans)
Which gave:
To ensure the labels were being associated with the images and to see some of the images, I then ran the following code to define a function that generates random numbers within the number of images, then shows the corresponding images, their labels, and their shapes:
def image_show(data, labels , number_of_image ): numbers=np.random.randint(0,len(data),number_of_image) plt.figure(figsize=(40,20)) j = number_of_image/10 for _,i in enumerate(numbers): plt.subplot(j,10,_+1) plt.imshow(data[i] , cmap="gray") label="" for x in labels[i]: label+=x+" , " plt.title(label+"\n"+f"size {data[i].shape}") plt.xticks([]),plt.yticks([])plt.show()
I then called the function and set the “number_of_image” variable to 40:
image_show(scans,Labels_after,40)
Which gave:
Preparing for Training
After the images and data were prepared, they had to be transformed into a more usable format for Machine Learning. The first component of this was associating numbers to each pathology classification of the X-rays, as ML logarithms do not use qualitative labels.
To do this, I created a dictionary named “classes” then defined two functions that, together, exchanged a qualitative label for a quantitative one:
classes={0:"Hernia",1:"Pneumonia",2:"Fibrosis",3:"Edema",4:"Emphysema",5:"Cardiomegaly",6:"Pleural_Thickening",7:"Consolidation",8:"Pneumothorax",9:"Mass",10:"Nodule",11:"Atelectasis",12:"Effusion",13:"Infiltration",14:"No Finding",}def get_class(code): return classes[code]def get_code(labels): for key,value in classes.items(): if value ==labels: return key
I then applied these functions to each label set for each image with a “for” loop:
for i in tqdm(range(len(Labels_after))):Labels_after[i]=[get_code(x) for x in Labels_after[i]]Labels_after
Which gave:
To make the labels more easily interpretable, I then turned the set of labels into a NumPy array:
mlp=MultiLabelBinarizer()Labels=mlp.fit_transform(Labels_after)scans=np.array(scans)Labels=np.array(Labels)
Next, I checked the shape of each array:
scans.shape, Labels.shape
Which gave:
Splitting the Data
After converting the data to a more ML-friendly format, I split the data into train, test, and validation sets and printed out the shapes of the results:
!pip install scikit-multilearnfrom skmultilearn.model_selection import iterative_train_test_splitX_train, y_train, X_test, y_test = iterative_train_test_split(scans, Labels, test_size = 0.2 )X_val, y_val, X_test, y_test = iterative_train_test_split(X_test, y_test, test_size = 0.7)print("X_train shape",X_train.shape)print("y_train shape",y_train.shape)print("X_val shape",X_val.shape)print("y_val shape",y_val.shape)print("X_test shape",X_test.shape)print("y_test shape",y_test.shape)
Which printed:
Image Data Augmentation
As the last step before setting up the training model, I augmented the data using the “ImageDataGenerator” preprocessing function to prevent overfitting:
transform = ImageDataGenerator(rotation_range=20,width_shift_range=0.1,height_shift_range=0.1,shear_range=0.1,zoom_range=0.1,horizontal_flip=False,vertical_flip=True,fill_mode='nearest')
Creating the Model
After preparing and transforming the data, the first step was to set a batch size for training and apply this batch size to each set (training, validation, and testing):
batch_size=16train_transform=transform.flow(X_train,y_train,batch_size=batch_size)val_transform=transform.flow(X_val,y_val,batch_size=batch_size)test_transform=transform.flow(X_test,y_test,batch_size=batch_size)
For training, I used InceptionV3, a 48-layer deep learning model for image recognition, to create a trainable model that could classify the X-rays. I set up the model with the following code:
inception = InceptionV3(input_shape = (224, 224, 3), weights = "imagenet", include_top =False)
The image base for Inception is ImageNet, which does not include medical imaging images. Because of this, I made the layers in the model trainable:
for layer in inception.layers:layer.trainable =True
I then applied “GlobalAveragePooling2D” to the output of the model to prepare the output for the final classification, and set the prediction of the classes (the pathological labels) to be the output after “GlobalAveragePooling2D” (represented by “z”), passed through the “Dense” Neural Network layer, with the “sigmoid” activation function:
z = GlobalAveragePooling2D()(inception.output)prediction = Dense(len(classes), activation='sigmoid')(z)
Next, I connected these functions into the final model, and printed its summary:
model = Model(inputs=inception.input, outputs=prediction)model.summary()
Which gave:
Lastly, I compiled the model by setting its loss and optimizer functions, and its metrics. I used “binary_crossentropy” as the loss function, as there are multiple categories for the data, “adam” as the optimizer as it uses a combination of the heuristics of the RMSProp and Momentum optimizers:
model.compile(loss = 'binary_crossentropy',optimizer ='adam',metrics = ['accuracy'],)
To prevent the code from continuing to run if the validation loss does not improve over 5 epochs, I set up the “EarlyStopping” callback:
callbacks = [tf.keras.callbacks.EarlyStopping(monitor="val_loss" , patience=5 , verbose=1)]
Training the Model
To train the model, I used “model.fit” and set the batch size to 32 and the number of epochs to 10. To ensure that “EarlyStopping” is enforced, I set “callbacks” to “callbacks”, the variable that I created to store the EarlyStopping function:
model.fit(X_train, y_train, batch_size=32, epochs = 10, validation_data=(X_val, y_val), verbose=1, callbacks = callbacks)
Which gave:
Viewing the Results
Lastly, to view the results of the training and the accuracy on testing data, I used “model.evaluate” for each set (training, validation, and testing):
model.evaluate(train_transform),model.evaluate(val_transform), model.evaluate(test_transform)
Which gave:
Analysis of the Results
The accuracy for training, which is most representative of what the accuracy would be were the model to be used on new images, was almost 53%. While this would not be acceptable for models with binary classification (2 possible results), considering that the model was classifying images for 15 different labels, and the nature of the dataset, 53% signifies that the model is nearly 8 times more effective than random guessing. For the accuracy to be increased, the original dataset — which is 20 times the size of the one used for this project—could be used.
Main Takeaways
While this project was fairly simple in nature, it was extremely useful in teaching me some of the foundational concepts of using AI in diagnostic imaging. It also helped to show me the potential of AI in the field of veterinary medicine, along with some of the challenges opposing this potential, such as the lack of available veterinary data.
This article is the second in a 4-part series by Madison Page on creating in the field of Artificial Intelligence. The first can be found here. The next will be published in the upcoming weeks.