Teaching Machines to Detect Skin Cancer

Leveraging artificial intelligence to classify medical images such as images of moles, CT scans, MRI scans and etc. as a diagnosis tool.

Joey Mach
Analytics Vidhya
11 min readOct 10, 2019

--

The diagnosis process for cancer patients looks something like this:

  1. A visit to your family physician to determine if further testing is required (~3 days to schedule an appointment)
  2. If further testing is required, a skin biopsy is typically performed (~3 weeks to schedule an appointment)
  3. If diagnosed with skin cancer, further testing may be recommended to provide additional details like the stage of cancer (~additional 2 weeks)

The entire diagnosis process takes approximately 1.5 months if you're lucky. I’ve heard horror stories of people waiting for hours in the ER to see a doctor, waiting several months to visit a specialist, and the list goes on.

An interesting thing I realized is that we don’t really fully recognize misdiagnosis as a critical issue. It’s a hidden problem and one that isn’t discussed frequently. The misdiagnosis rate for cancer hovers around 20%. That’s 1 in every 5 patients which accounts for 3.4 million cases every year! According to a study conducted, 28 percent of misdiagnosed cases are life-threatening.

What if I told you that artificial intelligence can detect skin cancer and potentially any type of disease with far better accuracy and in much less time than human beings. In the context of skin cancer, all you need to do is literally feed the machine a picture of the mole, and boom the machine instantly gives you a diagnosis. And in fact, I coded an algorithm to do just that!

Understanding the dataset: MNIST HAM 10000

To train the machine learning model, I used the dataset MNIST HAM 10000. There are a total of 10 015 dermatoscopic images of skin lesions labeled with their respective types of skin cancer.

The images in the data-set are separated into the following seven types of skin cancer:

  • Actinic keratosis is considered to be a noncancerous (benign) type of skin cancer. However, if left untreated, it usually develops into squamous cell carcinoma (which is cancerous).
  • Unlike actinic keratosis, basal cell carcinoma is a cancerous type of skin lesion that develops in the basal cell layer located in the lower part of the epidermis. It is the most common type of skin cancer accounting for 80% of all cases.
  • Benign keratosis is a noncancerous and slow-growing type of skin cancer. They can be left untreated as they are typically harmless.
  • Dermatofibromas are also noncancerous and usually harmless, thus no treatment is required. It is commonly found pinkish in color and appears like a round bump.
  • Melanoma is a type of malignant skin cancer that originated from melanocytes, cells that are responsible for the pigment of your skin.
  • Melanocytic nevi are a benign type of melanocytic tumor. Patients with melanocytic nevi are considered to be at a higher risk of melanoma.
  • Vascular lesions are composed of a wide range of skin lesion including cherry angiomas, angiokeratomas, and pyogenic granulomas. They are similarly characterized as being red or purple in color and often appear as a raised bump.
Basal cells are located in the lower part of the epidermis layer of the skin. Melanocytes are located beneath basal cells.

For my model, I applied a convolutional neural network (CNN) which is a deep learning algorithm to train my data. CNNs are especially established in the area of image classification.

But wait, what is Deep Learning?

Deep learning is a sub-class of machine learning that is inspired by the neural connectivity of the brain.

Similar to the architecture of the brain, deep learning makes it possible for different nodes/neurons (the circles in the picture) in each layer to be connected to the consecutive layers.

There are 3 main types of layers in deep learning:

  1. Input Layer: where the input data is fed into the model
  2. Hidden Layers: responsible for discovering the meaning of the data
  3. Output Layer: returns the predicted answer/label

There are two variables that are involved in interpreting the data, weights, and biases.

Weights are denoted by the symbol w, and biases are denoted by the symbol b.

In the scenario above, the weight and height are the two inputs fed into the model. The deep neural network then multiplies the input by the weight and adds the bias. To produce an output, the answer is passed through an activation function. In simple terms, activation functions are responsible for computing the output of a node given the input.

Training the data requires not only an algorithm, in the case of deep neural networks, it also requires defining a loss function. Loss functions allow the model to understand its prediction errors.

A Loss function is denoted as: (actual output - predicted output) ** 2

We term the process of minimizing the error rate after every iteration through the model “learning”.

After every iteration, the weights and biases are updated in a way that minimizes the error rate of the model. When training a model, the goal is to essentially minimize loss which subsequently increases accuracy as the model makes fewer and fewer mistakes. Wait, but how exactly are the weights and biased refined?

For each loss data point on computed by the loss function, the average gradient (a vector representing the derivative of the function) is calculated by leveraging the backpropagation algorithm. The gradient tells us how much each parameter influences the output.

The gradient descent algorithm uses the backpropagation algorithm to refine the weights. Basically, gradient descent’s goal is to find the minimum of the loss function, the point where the loss of the model is closest to 0.

The cost function is also known as a loss function. Gradient descent is used to find the minimum (which is labeled as the winner in the picture).

When the model reaches the output layer, an activation function is used to normalize the values so they correspond to a percentage. The sigmoid function is the common activation function used to convert all values into a number that is between 0 and 1 representing the probability of each output. The output value with the highest probability is the model’s prediction.

To sum it up, deep learning is a subset of machine learning and belongs to a class of algorithms called neural networks. The architecture of neural networks mimics how neurons in our brains are connected to each other.

The Model: Leveraging Convolutional Neural Networks

Similar to neural networks, CNNs have input layers, output layers, and are composed of nodes. But wait, how are convolutional neural networks different?

Vanilla neural networks (aka multilayer perceptron) takes a vector in as an input. For an image with a shape of 100 pixels by 100 pixels, a regular multilayer perceptron will have to compute 10,000 weights for each node in the second layer. It you have a 10 node layer, that number quickly accelerates to 100,000 weights!

Leveraging the architecture of CNNs, the number of parameters needed can be greatly reduced. This is because CNNs use filters which are composed of a vector of weights as its learnable parameters. This means that that size of the image doesn’t necessarily affect the number of “learnable” parameters.*

CNNs can take multi-channels of an image in as an input. Images usually consist of 3 layers as they are composed of three primary colors: red, blue, and green. Each pixel in each of the three layers constitutes a number ranging from 0 to 255 that represents the intensity of the color.

Aside from the input and output layers, just like other neural networks, convolutional neural networks includes multiple hidden layers. There are three main types of hidden layers found in convolutional neural networks:

  1. Convolutional layers
  2. Pooling layers
  3. Fully connected layers

( Unlike the fully connected neural networks, like the vanilla neural networks (aka multilayer perceptrons), CNNs are not as connected structure-wise. )

The convolutional layers are responsible for performing the dot product between the input and the filter. All the computational work takes place in these layers. Filters are the model’s “learnable” parameters as they are used to detect specific features like the edges of an object in the image; this process is also known as feature extraction. Similar to how in multilayer perceptrons, the weights are refined after every iteration, the filters (in the context of CNNs) which contains the weights are the parameters that are refined after every training iteration. For CNNs, the most common activation function applied is called a ReLU (Rectified Linear Unit) function and is denoted as max(0,x). This means that for any value less than 0, the output will be 0, and for all values above 0, the output would be x (the input). ReLU functions output only positive values and therefore restricts the range.

An example of a convolutional layer doing its job. Dot multiplication is performed between the matrix of the weight and the matrix of the input image of the same shape as the weight. The resulting product is produced and shown on the left-most matrix.

The pooling layers narrows the size of the data by reducing its dimensions. For my model, I used a specific pooling layer called max pooling. Max pooling works by comparing neurons from different layers, and then extracts only the highest value as the output, hence reducing the data’s dimensions.

Max pooling function

Fully connected layers are similar to the hidden layers seen in multilayer perceptrons, where nodes of one layer are fully connected to the nodes in the corresponding layers. This layer is responsible for communicating the final output.

The Nitty-Gritty (aka The Source Code)

Moving on to the fun stuff and of course, all the code!

Instead of downloading 3GB of images and then uploading it on google collaboratory which can be tedious, I used the Kaggle API instead.

#-------------------------Kaggle API Setup---------------------#Install kaggle library
!pip install kaggle
#Make a directory called .kaggle which makes it invisible
!mkdir .kaggle
import json
token = {"username":"ENTER YOUR USENAME","key":"ENTER YOUR KEY"}
with open('/content/.kaggle/kaggle.json', 'w') as file:
json.dump(token, file)

!cp /content/.kaggle/kaggle.json ~/.kaggle/kaggle.json
!kaggle config set -n path -v{/content}
!chmod 600 /root/.kaggle/kaggle.json

After setting up the Kaggle API, download the MNIST HAM 10000 dataset and unzip the files.

#---------------Downloading and unzipping the files--------------#Data directory: where the files will unzip to(destination folder) 
!mkdir data
!kaggle datasets download kmader/skin-cancer-mnist-ham10000 -p data
!apt install unzip!mkdir HAM10000_images_part_1
!mkdir HAM10000_images_part_2
!unzip /content/data/skin-cancer-mnist-ham10000.zip -d /content# Unzip the whole zipfile into /content/data
!unzip /content/data/HAM10000_images_part_1.zip -d HAM10000_images_part_1
!unzip /content/data/HAM10000_images_part_2.zip -d HAM10000_images_part_2
#Ouputs me how many files I unzipped
!echo files in /content/data: `ls data | wc -l`

Different directories are made for the training dataset and the validation dataset. Seven sub-folders for the seven different labels are created inside both the training and validation directory.

#-------------------Make directories for the data-------------------import os 
import errno
base_dir = 'base_dir'image_class = ['nv','mel','bkl','bcc','akiec','vasc','df']#3 folders are made: base_dir, train_dir and val_dirtry:
os.mkdir(base_dir)

except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pass
train_dir = os.path.join(base_dir, 'train_dir')
try:
os.mkdir(train_dir)
except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pass
val_dir = os.path.join(base_dir, 'val_dir')
try:
os.mkdir(val_dir)

except OSError as exc:
if exc.errno != errno.EEXIST:
raise
pass
#make sub directories for the labels
for x in image_class:
os.mkdir(train_dir+'/'+x)
for x in image_class:
os.mkdir(val_dir+'/'+x)

To preprocess the data, the data is split into training data and testing data at a 9–1 ratio. The data is then moved accordingly to the folder that corresponds with its label.

#-----------------splitting data/transfering data-------------------#import libraries 
import pandas as pd
import shutil
df = pd.read_csv('/content/data/HAM10000_metadata.csv')# Set y as the labels
y = df['dx']
#split data
from sklearn.model_selection import train_test_split
df_train, df_val = train_test_split(df, test_size=0.1, random_state=101, stratify=y)
# Transfer the images into folders, Set the image id as the index
image_index = df.set_index('image_id', inplace=True)
# Get a list of images in each of the two folders
folder_1 = os.listdir('HAM10000_images_part_1')
folder_2 = os.listdir('HAM10000_images_part_2')
# Get a list of train and val images
train_list = list(df_train['image_id'])
val_list = list(df_val['image_id'])
# Transfer the training images
for image in train_list:
fname = image + '.jpg'if fname in folder_1:
#the source path
src = os.path.join('HAM10000_images_part_1', fname)

#the destination path
dst = os.path.join(train_dir+'/'+df['dx'][image], fname)
print(dst)

shutil.copyfile(src, dst)
if fname in folder_2:
#the source path
src = os.path.join('HAM10000_images_part_2', fname)
#the destination path
dst = os.path.join(train_dir, fname)

shutil.copyfile(src, dst)
# Transfer the validation images
for image in val_list:
fname = image + '.jpg'if fname in folder_1:
#the source path
src = os.path.join('HAM10000_images_part_1', fname)
#the destination path
dst = os.path.join(val_dir+'/'+df['dx'][image], fname)

shutil.copyfile(src, dst)

if fname in folder_2:
#the source path
src = os.path.join('HAM10000_images_part_2', fname)
# destination path to image
dst = os.path.join(val_dir, fname)
# copy the image from the source to the destination
shutil.copyfile(src, dst)
y_valid.append(df['dx'][image])
# Check how many training images are in train_dir
print(len(os.listdir('base_dir/train_dir')))
print(len(os.listdir('base_dir/val_dir')))
# Check how many validation images are in val_dir
print(len(os.listdir('data/HAM10000_images_part_1')))
print(len(os.listdir('data/HAM10000_images_part_2')))

I used an image generator to apply random transformations to my images. Additionally, a neat feature with using an image generator is that it automatically resizes the data to the dimensions given in the parameter target_size.

#--------------image generator---------------
from keras.preprocessing.image import ImageDataGenerator
import keras
print(df.head())
image_class = ['nv','mel','bkl','bcc','akiec','vasc','df']
train_path = 'base_dir/train_dir/'
valid_path = 'base_dir/val_dir/'
print(os.listdir('base_dir/train_dir'))
print(len(os.listdir('base_dir/val_dir')))
image_shape = 224train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
#declares data generator for train and val batches
train_batches = train_datagen.flow_from_directory(train_path,
target_size = (image_shape,image_shape),
classes = image_class,
batch_size = 64
)
valid_batches = val_datagen.flow_from_directory(valid_path,
target_size = (image_shape,image_shape),
classes = image_class,
batch_size = 64 )

We all assume the hard part is coding the model, but it’s actually everything above (aka preprocessing the data).

Instead of using a convolutional neural network, I leveraged a type of architecture called Mobile Net. It’s a pre-trained model that is trained on the dataset ImageNet, which has over 14 million images. For the purpose of detecting skin cancer, I constructed several layers on top of the Mobile Net and then trained it on the MNIST: HAM 10000 dataset.

The main reason why I used Mobile Net instead of regular convolutional neural networks is due to the minimal computational power needed as it reduces the number of learnable parameters and is designed to be “mobile” friendly.

#-------------------------------model------------------------------
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import Conv2D, MaxPool2D, Dropout, Flatten
from keras.callbacks import ReduceLROnPlateau
from keras.models import Model
mobile = keras.applications.mobilenet.MobileNet()
x = mobile.layers[-6].output
# Add a dropout and dense layer for predictions
x = Dropout(0.25)(x)
predictions = Dense(7, activation='softmax')(x)
print(mobile.input)
net = Model(inputs=mobile.input, outputs=predictions)
mobile.summary()
for layer in net.layers[:-23]:
layer.trainable = False
net.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', patience=3, verbose=1, factor=0.5, min_lr=0.00001)history = net.fit_generator(train_batches, epochs=10)

After training the model, a 70% accuracy was achieved when performing on the test batches. With a larger dataset, the accuracy can easily be enhanced. Artificial intelligence has so much potential to disrupt the healthcare industry. Imagine having artificial intelligence diagnosis practically any disease way better and faster than human beings, that’s insane!

This isn’t science fiction, the possibilities of AI are endless! AI is already revolutionizing healthcare in China. A hospital in China introduced a program called AI-Force, which leverages AI-enabled machines that are able to detect 30 chronic diseases with 97% accuracy!

Key Takeaways

  • Deep learning is inspired by the neural connectivity of the brain in that every node in each layer is connected to the next layer
  • Convolutional neural networks have 3 main types of hidden layers: convolutional layer, pooling layer, and fully connected layer
  • A filter (for CNNs) is used to extract features from the data

Don’t forget to:

  • Clap this article if you enjoyed it
  • Connect with me on LinkedIn

--

--