Lung Cancer Detection using Convolutional Neural Networks

4 min readMar 15, 2023

It is not possible to get a medical expert degree to classify a disease but guess what machine learning and computer vision enables you do it on your own 🫁⚕️.

How common is lung cancer?

Lung cancer (both small cell and non-small cell) is the second most common cancer in both men and women in the United States (not counting skin cancer). In men, prostate cancer is more common, while in women breast cancer is more common.

The American Cancer Society’s estimates for lung cancer in the US for 2023 are:

About 238,340 new cases of lung cancer (117,550 in men and 120,790 in women)
About 127,070 deaths from lung cancer (67,160 in men and 59,910 in women)

Approach Followed :-

For training our model we have used the Keras API.
We have used 2D Convolution Layer along with consecutive MaxPooling Layers to improve the models performance.
Because we are facing a two-class classification problem, i.e. a binary classification problem, we will end the network with a sigmoid activation. The output of the network will be a single scalar between 0 and 1, encoding the probability that the current image is class 1 (as opposed to class 0).
Image Dataset Link
The below code list down the path and set the following to a variable.

import os

base_dir = '/content/drive/MyDrive/ctscan_images'  (your path can be different)

print("Contents of base directory:")
print(os.listdir(base_dir))

print("\nContents of train directory:")
print(os.listdir(f'{base_dir}/train'))

print("\nContents of validation directory:")
print(os.listdir(f'{base_dir}/validation'))


-------------------------------------------------

Output:-

Contents of base directory:
['validation', 'train']

Contents of train directory:
['normal', 'lung_cancer']

Contents of validation directory:
['normal', 'lung_cancer']

Building a Small Model from Scratch to get to ~72% Accuracy To train a neural network to handle the images, you’ll need them to be in a uniform size. You will choose 150x150 pixels for this, and you’ll see the code that preprocesses the images to that shape shortly.

import tensorflow as tf

model = tf.keras.models.Sequential([
    # Note the input shape is the desired size of the image 150x150 with 3 bytes color
    tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(150, 150, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2,2), 
    tf.keras.layers.Conv2D(64, (3,3), activation='relu'), 
    tf.keras.layers.MaxPooling2D(2,2),
    # Flatten the results to feed into a DNN
    tf.keras.layers.Flatten(), 
    # 512 neuron hidden layer
    tf.keras.layers.Dense(512, activation='relu'), 
    # Only 1 output neuron. It will contain a value from 0-1 where 0 for 1 class ('cats') and 1 for the other ('dogs')
    tf.keras.layers.Dense(1, activation='sigmoid')  
])

Model Compile

You will train our model with the binary_crossentropy loss, because it’s a binary classification problem and your final activation is a sigmoid. We will use the rmsprop optimizer with a learning rate of 0.001. During training, you will want to monitor classification accuracy.

from tensorflow.keras.optimizers import RMSprop       
        model.compile(optimizer=RMSprop(learning_rate=0.001),
        loss='binary_crossentropy',
        metrics = ['accuracy'])

Fitting Data to the Model

model.fit(
        train_generator,
        epochs=15,
        validation_data=validation_generator,
        verbose=2
            )

Fitting Data to the Model

model.fit(
        train_generator,
        epochs=15,
        validation_data=validation_generator,
        verbose=2
            )

Plotting the Traning vs Validation (Accuracy and Loss)

import matplotlib.pyplot as plt

def plot_loss_acc(history):
  '''Plots the training and validation loss and accuracy from a history object'''
  acc = history.history['accuracy']
  val_acc = history.history['val_accuracy']
  loss = history.history['loss']
  val_loss = history.history['val_loss']

  epochs = range(len(acc))

  plt.plot(epochs, acc, 'r', label='Training accuracy')
  plt.plot(epochs, val_acc, 'b', label='Validation accuracy')
  plt.title('Training and validation accuracy')

  plt.figure()

  plt.plot(epochs, loss, 'r', label='Training Loss')
  plt.plot(epochs, val_loss, 'b', label='Validation Loss')
  plt.title('Training and validation loss')
  plt.legend()

  plt.show()

Predicting on a image taken from validation set or any random CT Scan image

import numpy as np

from google.colab import files
from tensorflow.keras.utils import load_img, img_to_array

uploaded=files.upload()

for fn in uploaded.keys():
 
  # predicting images
  path='/content/' + fn
  img=load_img(path, target_size=(150, 150))
  
  x=img_to_array(img)
  x /= 255
  x=np.expand_dims(x, axis=0)
  images = np.vstack([x])
  
  classes = model.predict(images, batch_size=10)
  
  print(classes[0])
  
  if classes[0][0]>0.5:
    print(fn + " is a normal case")
  else:
    print(fn + " is a lung cancer case")

Exporting the trained Model into a “.h5” file.

from keras.models import load_model
model.save("lungcancer_model_cnn.h5")

You can access the entire collab File here to try it out yourself …🧑‍💻 [LINK]

Lung Cancer Detection using Convolutional Neural Networks

How common is lung cancer?

Approach Followed :-

Model Compile

Fitting Data to the Model

Fitting Data to the Model

Plotting the Traning vs Validation (Accuracy and Loss)

Predicting on a image taken from validation set or any random CT Scan image

Written by Vedant Kadam