How I Created a Model That Identifies Cardiac Conditions

Using the NIH Chest X-Ray Dataset with Tensorflow for my First AI Project

Madison Page
Visionary Hub
9 min readDec 17, 2021

--

For context and some background information about Artificial Intelligence in diagnostics, see the first article in this series, Using AI in Diagnostic Imaging.

In October, I decided to learn more about AI and its applications in medicine. As someone who works in a veterinary clinic themselves and is familiar with the diagnosis process for patients, I am painfully aware of some of the issues limiting early diagnosis and, therefore, treatment of our patients. The biggest one I’ve observed? Money. Clients are often simply not willing or able to spend large sums on preventative testing, especially when their pets are still young. And while some of the costs associated with diagnostic screening are currently unavoidable, such as those related to creating images used (X-ray and ultrasound images, cell slides, etc.), further costs, including those required to have a specialist analyze the image, could possibly be reduced if image recognition was to be more widely incorporated into the veterinary field. With this hope, I decided to learn how to apply AI to diagnostic imaging myself.

Due to similarities between some cardiac conditions in humans and animals (such as cardiomegaly, which is seen very often in dogs with heart failure), I decided to use the NIH Chest X-Rays Dataset for my first AI project. I opted to use the Random Sample version of the dataset, posted on Kaggle by the National Institute of Health, which consists of 5,606 images, 5% of the original number. As I had not previously worked with AI, I used a part of Omar Salah Hemied’s code with the same dataset to read, prepare, and split the data. I then used InceptionV3, a pre-created image recognition model, as the base for the model. I wrote the code in Python and ran it in a Google Colab notebook.

Steps for Creating the Model

Imports and Setup

First, I mounted my Google Drive, where I had uploaded the dataset and CSV file, the document that contained all labels associated with each data point. In this case, the data points were the images from the NIH Chest X-Ray Dataset.

Next, I imported TensorFlow to my notebook in Google Colab. TensorFlow is an open-source Python library, essentially a platform containing different sub-libraries (such as Keras) and functions that can be used in Machine Learning to create and use models. I used version 2.7.0.

Then, I imported the modules necessary for the creation and preparation of the model. This included InceptionV3, which can be imported from TensorFlow. I also imported components from — and sometimes entire — libraries. The libraries and modules that I imported to Colab included OS, CV2 (used for reading images), NumPy, Pandas, tqdm, Matplotlib, and scikit-learn.

There are some modules that ultimately were not used, but that could have been helpful to add layers to the model.

Reading and Preparing the CSV Data

After completing the necessary imports, the information from the CSV file had to be prepared. The first step for this process was to read the CSV file, which I accessed by copying its path in my Drive and pasting it into the “read_csv” function from Pandas.

Upon reading the file, I could print and see the first 5 rows (the “head” of the data):

For this project, I was only interested in the “Finding Labels” values associated with each image. To see some of the categorizations of the data and the number of images associated with them, I ran the following code:

Which gave:

Because some images had multiple labels, I then split any labels that were separated by the character “|”, and printed them:

Which gave:

This continued until each image was accounted for.

Reading and Connecting the Image Data

Next, I had to connect the actual images to the data and labels from the CSV file. I did this by connecting the path to the Drive folder with the images to a variable named “image_file_path”, then associating each image in this folder with its corresponding CSV data based on image index through a “for” loop. In this loop, I also resized the image to (224, 244) after setting 224 to a variable “image_size”. Lastly, I appended each image to a list “scans”.

I used the following code to check the shape of the images in “scans”:

Which printed:

I also verified that there was an equal number of labels and scans after transforming each:

Which gave:

To ensure the labels were being associated with the images and to see some of the images, I then ran the following code to define a function that generates random numbers within the number of images, then shows the corresponding images, their labels, and their shapes:

I then called the function and set the “number_of_image” variable to 40:

Which gave:

Preparing for Training

After the images and data were prepared, they had to be transformed into a more usable format for Machine Learning. The first component of this was associating numbers to each pathology classification of the X-rays, as ML logarithms do not use qualitative labels.

To do this, I created a dictionary named “classes” then defined two functions that, together, exchanged a qualitative label for a quantitative one:

I then applied these functions to each label set for each image with a “for” loop:

Which gave:

This continued until each image was accounted for.

To make the labels more easily interpretable, I then turned the set of labels into a NumPy array:

Next, I checked the shape of each array:

Which gave:

Splitting the Data

After converting the data to a more ML-friendly format, I split the data into train, test, and validation sets and printed out the shapes of the results:

Which printed:

The “X” sets correspond to the images, the “y” sets correspond to the labels.

Image Data Augmentation

As the last step before setting up the training model, I augmented the data using the “ImageDataGenerator” preprocessing function to prevent overfitting:

Creating the Model

After preparing and transforming the data, the first step was to set a batch size for training and apply this batch size to each set (training, validation, and testing):

For training, I used InceptionV3, a 48-layer deep learning model for image recognition, to create a trainable model that could classify the X-rays. I set up the model with the following code:

The image base for Inception is ImageNet, which does not include medical imaging images. Because of this, I made the layers in the model trainable:

I then applied “GlobalAveragePooling2D” to the output of the model to prepare the output for the final classification, and set the prediction of the classes (the pathological labels) to be the output after “GlobalAveragePooling2D” (represented by “z”), passed through the “Dense” Neural Network layer, with the “sigmoid” activation function:

Next, I connected these functions into the final model, and printed its summary:

Which gave:

These were the last few lines of the summary.

Lastly, I compiled the model by setting its loss and optimizer functions, and its metrics. I used “binary_crossentropy” as the loss function, as there are multiple categories for the data, “adam” as the optimizer as it uses a combination of the heuristics of the RMSProp and Momentum optimizers:

To prevent the code from continuing to run if the validation loss does not improve over 5 epochs, I set up the “EarlyStopping” callback:

Training the Model

To train the model, I used “model.fit” and set the batch size to 32 and the number of epochs to 10. To ensure that “EarlyStopping” is enforced, I set “callbacks” to “callbacks”, the variable that I created to store the EarlyStopping function:

Which gave:

Viewing the Results

Lastly, to view the results of the training and the accuracy on testing data, I used “model.evaluate” for each set (training, validation, and testing):

Which gave:

Analysis of the Results

The accuracy for training, which is most representative of what the accuracy would be were the model to be used on new images, was almost 53%. While this would not be acceptable for models with binary classification (2 possible results), considering that the model was classifying images for 15 different labels, and the nature of the dataset, 53% signifies that the model is nearly 8 times more effective than random guessing. For the accuracy to be increased, the original dataset — which is 20 times the size of the one used for this project—could be used.

Main Takeaways

While this project was fairly simple in nature, it was extremely useful in teaching me some of the foundational concepts of using AI in diagnostic imaging. It also helped to show me the potential of AI in the field of veterinary medicine, along with some of the challenges opposing this potential, such as the lack of available veterinary data.

This article is the second in a 4-part series by Madison Page on creating in the field of Artificial Intelligence. The first can be found here. The next will be published in the upcoming weeks.

--

--

Madison Page
Visionary Hub

Working on Lynx to reduce veterinary diagnostic costs