Can we detect cancerous skin moles based on picture only?

Teresia Savera
Analytics Vidhya
Published in
7 min readSep 8, 2020

I was a final year electrical engineering student when I got a final project assignment from biomedical class. We were free to choose project topic as long as it has the potential to bring benefits to the medical world.

I got a little exposure to image processing since I took image processing class, so I thought, “Why don’t I make some system/program that can decide something based on image only?” Once the program algorithm is done, there will be possibility for the project to be developed further into a mobile / website application that can be used by nonexpert without any special technique or tools

So, I chose Early Skin Cancer Detection as my project. The goal of this project is to detect which moles are cancerous and which are non cancerous.

Why it has to be skin cancer? Why not brain cancer? Or any other cancer? Because skin is something we can easily look without any special tools like MRI, X-Rays, or any other biomedical tools. We can just take a picture of the skin, put the image into our program, and voila! We got the result!

But, how did I start ?

First, I need to know the characteristic of cancerous and non-cancerous skin moles. I had no trouble retrieving the skin cancer dataset because the data are all over the internet. I chose dataset from ISIC and I downloaded it here

Now, let’s take a look at the images!

Benign (Non cancerous)

Malignant (cancerous)

When I first saw these images, I got pretty confused because there is no major difference between cancerous and non cancerous moles. So, I began to do some literature research and I found that there are some past researches about detecting cancerous skin cancer.

From the past researches there are 2 main method to detect cancerous skin. First, using feature extraction, and second using Convolution Neural Network. Convolutional neural network is quite popular in classifying images because its ability to learn features from input data using its 2D convolutional layers. Convolutional neural network is like any machine learning method where we can just put a bunch of dataset and let the system learn by itself by updating its weight of each neuron and etc (I will not explain the detail about how machine learning works here)

Convolutional neural network is actually cool and I predicted this method will have good performance if we have a big dataset. But I was more interested in analyzing in a more ‘humane’ way, moreover I want to know deeper about extracting important information from bunch of data. Therefore, I chose feature extraction method

Feature Extraction

In every data analysis process, the first question that should came into the mind is

“What kind of information do we need to extract from this data?”

The same goes with image analysis, because basically image is data with 2 dimensional form. So the first question that came into my head was

“What kind of feature do I need to extract?”

From past researches, Asymmetry, Border Irregularity, Color, and Diameter are parameter that been widely used by doctor to analyze cancerous skin moles (Jain & Pise, 2015). And just for the sake of simplicity (yeah I got a tight deadline back then), I chose to analyse cancerous skin image by only extracting asymmetry and border irregularity features.

And here comes the challenging part, translating those feature from image into readable format.

Extracting Asymmetry Feature

How do we know if an object is symmetrical? We can fold it in one direction and see if one side overlaps 100% with the other side

Therefore, to obtain symmetrical value of the object, we can just divide the mole picture into two, overlap the two pictures of divided mole, and see if they overlap 100%

But first, we need to separate the foreground (the moles) from the background. Here was what I have done to separate the foreground:

  1. Convert the image into grayscale mode
  2. Apply median filter to filter out the noises come from the background

And here was the result

before(left) and after (right)

Now we can see the huge difference between foreground and background. If we look at the histogram of the image, each pixel of the image has value between 0–255 (0 is black and 255 is white, otherwise it is gray with different intensity). This diverse value will make it harder to completely separate foreground from the background, so I used Otsu thresholding method to turn the value of each pixel into 0 or 1 only (it is also widely known as binerization process). Pixel with value higher than 140 is turned into 1, otherwise it is turned into 0.

Now we can separate the foreground completely into one new image!

We can compute the asymmetry of the object by dividing the image(foreground only) into two new image, overlap it, and calculate the asymmetry percentage using this formula

ΔT is the difference of non overlapping area and T is the total area of the object

I used this Matlab code to find the centroid of object, translate it to the centre, fold it, and calculate the asymmetry degree of the object

%% Make the measurements
props = regionprops(labeledImage, ‘Centroid’, ‘Orientation’);
xCentroid = props.Centroid(1)
yCentroid = props.Centroid(2)
%% Find the half way point of the image.
middlex = columns/2;
middley = rows/2;
%% Translate the image
deltax = middlex — xCentroid;
deltay = middley — yCentroid;
binaryImage = imtranslate(binaryImage, [deltax, deltay]);
%% Rotate the image
angle = -props.Orientation
rotatedImage = imrotate(binaryImage, angle, ‘crop’);
%% flip to check asymmetry
flipped =fliplr(rotatedImage)

%% Measuring assymmetry
% get the total area
props2 = regionprops(labeledImage, ‘Area’);
% count non overlapped area
nonOverlapped = xor(flipped, rotatedImage);

Extracting Border Irregularity

After finding centroid of the object and shifting it to the center of the picture, I used the famous canny edge detection algorithm to find the edge of the moles. Here was the result

Edges irregularity analysis is quite complex because of its subjective nature. In this study, the analysis of edge irregularities was carried out by calculating the distance of each edge to the center of mass of the “skin disorder” area and depicting it in a 1-dimensional graph. If the graph tends to be straight or hyperbole without many local maximums or minimums, it means that the edges have clear and normal edges. Determination of the number of local maximum and minimum values is done by calculating the derivative of the graph and then calculating the number of zero crosses in the derivative graph. Zero crossing in a derivative graph depicts a local maximum or minimum value. The more zero crossing values in the edge derivative graph, the more irregular the edges of the image are.

Here is the code I used to compute the degree of edge irregularity


%% filledimage to make border clearer
filledimage = regionprops(labeledImage,'FilledImage')
filledimage = filledimage.FilledImage;
%% find centroid of the image
edgeprops = regionprops(filledimage,'BoundingBox', 'Centroid');
%% Edge detection
findedge = edge(filledimage,'canny');
%% Convert centroid to integer
boundingboxint = uint32(edgeprops.BoundingBox)
centroidint =uint32(edgeprops.Centroid)
%% Scan the edge of image and plot it into 1D array
plotdist=[]
for i= boundingboxint(2) : boundingboxint(4)
for j= boundingboxint(1) : boundingboxint(3)
if findedge(i,j) == 1
dist = sqrt(double(( ((i-centroidint(1))^2) + ((j-centroidint(2))^2) ) ))
plotdist = vertcat (plotdist, dist)
end
end
end
%% Derivate the edge array
firstorderdist = diff(plotdist);
%% Count the local maximum and minimum of the border by counting how many times the graph cross zero
[r c] = size(firstorderdist)
count=0;
for i=1 :(r-1)
if ( firstorderdist(i)>0 & firstorderdist(i+1) <0 ) | ( firstorderdist(i)<0 & firstorderdist(i+1) >0 )
count= count+1;
end
end
borderarr=vertcat(borderarr, count);

Yep, we are done extracting the feature, hooray!

Classification

Before choosing classification method, the asymmetry and irregularity number is put in a table and the table is mapped into a graph so I can identify its distribution. From the results of data distribution, K-Nearest Neighbor algorithm has the most optimal hyperplane for the classification.

Because my computer has a poor performance in processing huge data, I can only process 80 benign training images, 80 malignant training images, 20 benign test images, and 20 malignant test images.

Result

Detection of skin cancer by feature extraction and k-NN regression resulted in an accuracy of 75% with precision of 85% and specificity of 72%. This accuracy value is affected by the quality of dataset where the dataset used does not have the same standard brightness level, while the binaryization used to separate the foreground from background has the exact parameter. This can cause a decrease performance in segmentation and parameter calculations

Conclusion and Further work

To conclude, we can’t depend on image only while detecting cancerous skin cancer because the accuracy is still low.

However we can possibily improve the accuracy by :

  1. Adding more features (color and diameter)
  2. Adding more training data
  3. Improve preprocessing algorithm so that the image can be standardized better

So, that’s it. I personally enjoy this project because it can satisfy my curiosity in data and image analysis. I love learning, so critics are welcome!

You can also reach me out through my email teresiasavera@gmail.com to know further about the code or just discuss some interesting topic related to image analysis or even machine learning!

Cheers!

--

--