Sub-classifying Lung Cancer with TensorFlow 2 and Keras

Published in

Analytics Vidhya

5 min readJan 25, 2020

Lung cancer continues to be a significant healthcare challenge. It is the leading cause of cancer death among men and the second leading cause of cancer death among women worldwide. Non-small cell lung cancer represents 85 % of all lung cancer cases.

Due to the recent availability of advanced targeted therapies, it is imperative to not only detect but also properly sub-classify non-small lung cancer into two major sub-types: squamous cell carcinoma and adenocarcinoma, which can be challenging at times even for experienced pathologists.

In this post, I go over the details on how I trained and tested a machine learning (ML) model to sub-classify non-small cell lung cancer images into squamous cell carcinoma and adenocarcinoma using TensorFlow 2 and Keras. In 2020, Google released TensorFlow 2 deep learning library with full Keras API integration, making it easier than ever to use. If you are new to convolutional neural networks, you can find a comprehensive guide on Medium and an in-depth description on Wikipedia.

Sections of this tutorial:

1. Choose Google Colab or local computer

2. Prepare training, validation and testing data directories

3. Import libraries

4. Specify paths to the training, validation and testing dataset directories

5. Normalize images and generate batches of tensor image data for training, validation, and testing

6. Visualize samples of training images (optional)

7. Build the convolutional network model

8. Compile and train the model

9. Evaluate the model

10. Assess trained model performance on the testing dataset

Let’s get started!

1. Google Colab or local computer.

A computer with a powerful NVIDIA GPU, will be best suited for this project. Excellent instructions on how to set up TensorFlow 2 with GPU support are provided in this YouTube video.

If you do not have a powerful NVIDIA GPU, then using Google Colab is an excellent alternative. It’s like using Jupyter Notebook in the cloud. Google Colab provides users with 12 hours of free computational time and access to GPU and even TPU. Good instructions on how to use Google Colab can be found in this Medium post.

2. Prepare training, validation, and testing directories.

For this project, I used an image dataset containing 5000 color images of lung squamous cell carcinoma and 5000 color images of lung adenocarcinoma from the LC25000 dataset, which is freely available for ML researchers. If you are going to use Google Colab, you need to upload images to your Google Drive.

Since I was using the Keras flow_from_directory method from the ImageDataGenerator class to generate batches of tensor image data for our model, I needed to organize the dataset into a specific directory structure outlined below. It is is a necessary step. Otherwise, the program does not work.

My original dataset folder contained two sub-folders with two classes of images (lung squamous cell carcinoma class and lung adenocarcinoma class). I used a split-folder python package to divide my original dataset folder into training, validation, and testing dataset folders with the same two classes in each of the folders. I used 80% of images for the training dataset, 10% for the validation dataset, and 10% for the testing dataset. A good explanation of differences between training, validation, and testing datasets can be found here. To summarize, I the used training dataset to train the model (find optimal weights), I used the validation dataset to fine-tune the model (specify the number of hidden layers, epochs, dropout layers, etc.), and I used the testing dataset to assess the performance of the fully trained model.

3. Import libraries.

4. Specify paths to the training, validation, and testing dataset directories

I used my local PC for this project. If you plan to use the Google Colab, you need to mount your Google Drive first.

5. Normalize images and generate batches of tensor image data for training, validation, and testing

6. Visualize samples of training images (optional)

The output should look something like this:

Lung squamous cell carcinoma and adenocarcinoma sample training images

7. Build the convolutional network model

A great instructional video on how to design a convolutional neural network can be found here. I built my model with three sets of convolutional/pooling layers for feature extraction and two dense layers for classification. I added a single dropout layer (20% dropout rate) to prevent model overfitting. I used a 3x3 kernel (filter) size for convolutional layers, max-pooling for pooling layers, relu activation function for the deep layers, and sigmoid activation function for the output layer.

Model summary:

As you can see, the model had almost fifteen million trainable parameters.

8. Compile and train the model

Since I was doing binary classification, I used binary_crossentropy as a loss function. I used Adam as an optimizer. “Gentle introduction to the Adam optimization algorithm” can be found here. The number of steps per epoch comes from dividing the number of images by the batch size. For example, I had 8000 images in the training dataset and set the ImageDataGenerator training batch size to 40 images. Dividing 8000 by 40 comes to 200 steps per epoch.

9. Evaluate the model

After 20 epochs the model displayed training accuracy of 0.99 and validation accuracy of 0.95.

Code for plotting accuracy and loss:

The output looked like this:

Training and validation accuracy and loss plots

10. Assess trained model performance on the testing dataset

Results of model performance on testing dataset

In summary, the trained model was able to classify previously unseen (testing dataset) non-small cell lung carcinoma images into squamous cell carcinoma and adenocarcinoma with 94 % accuracy.

You can find Jupyter Notebook file with the entire code for this tutorial on my GitHub repository. I hope you will find this tutorial helpful and I wish you the best of luck in your machine learning endeavors.

Sub-classifying Lung Cancer with TensorFlow 2 and Keras

Written by Andrew A Borkowski