Detection Of Melanoma Cancer Using Deep Learning — Part 3

In Part1 and Part2, we developed a shallow and a deeper CNN for detecting the presence of Melanoma Cancer by looking at the images of the lesions. We experimented with the techniques like Batch Normalization, Dropout, Local Response Normalization, Simple Data Augmentation, Image Standardization etc and were able to achieve an accuracy of 93.81%. While this accuracy figure is not something outstanding (due to imbalance of data) but it encouraged and motivated us to go a little deeper :)

Transfer Learning

Since our dataset is relatively small (12000 images), we decided to investigate transfer learning now. Transfer learning is a technique in which we use a fully-trained model which has been trained on a huge amount of data and retrain/tune it for our use case. There is an amazing tensorflow tutorial for transfer learning. I would request the readers to go through it as it is really detailed and our approach and code are inspired by it. For brevity, we would not go into the details of transfer learning in this post.


In this experiment, we load the already trained Inception model and use it as a feature extractor. We basically extract the output of the last convolution layer of the Inception model for all our images and store them offline. These outputs are called “bottlenecks” and are nothing but features of the images. Since the Inception model has been trained on the huge amount of data, it is able to extract relevant features like curves, edges etc from the images.

Once, we have these features we create a small network of 3 fully connected layers — 2048*512, 512*512 and 512*2. This small network is then trained to classify the images into our two categories.

Optimizer and Learning Rate

As per the tutorial, we also use an exponentially decaying learning rate and RMSProp as the optimizer. We tried Adam and SGD too but both of them were performing inferior to RMSProp.

The failure of Adam was a little surprising to us.

Batch Size

Since the inception model was used only as a feature extractor and not fine-tuned, our model became very simple. Thus, we were able to use a batch size of 128, which accelerated our training.

Batch Normalization

The BN still did not improve our results, so we shut it down. However, we are curious and would investigate on this further.


Just like before, we used an aggressive dropout of 0.5.

Local Response Normalization

We did not use this in this architecture because the convolution layers were not retrained.


While the accuracy gains in our previous experiments were not amazing, this time was a little different. With the above settings, we were able to get an accuracy of 95.2% which is an improvement of around 1.5% over our previous results. This was awesome :)

Transfer Learning Worked!

We still have a lot of ideas on data augmentation and regularization that we are going to try out in our next post. Till then — Happy Deep Diving!

[Code] I have shared the code here. Please checkout the branch “version3” for this setup. Feel free to use and modify it. Would be happy to coordinate with interested folks.