Week 6: Histopathological Cancer Detection

Anıl Aydıngün
bbm406f19
Published in
3 min readJan 15, 2020

Hello everyone, we are Tugay Calyan, Anil Aydingun and Denizcan Bagdatlioglu. In this week’s blog post, we will give you information about out model for solution. To view our blog posts in previous weeks:

Camelyon Dataset

PCam consists of 327,680 color images (96x96 pixels) produced from histopathological scans of lymph node sections of a cell or tissue. Each image is graded with a double label (1 indicates metastatic cancer, 0 indicates no metastatic cancer). Below are some images of camelyon16 dataset from histopathological scans. These images indicate cancer according to the label.

Some images of Pcam Dataset
Label distributions of Pcamelyon16 dataset for train images are as follows.

We have separated our datasets as train, test and validation.

Then, we distributed the images to the labels according to the id values of the images in the Csv file.

Model:

As we explained in our 2nd week medium article, we normalize the data and make it ready for our model. We use VGG16 which is one of the pretrained models. If we can use GPU after freezing the layers for training, we prepare our model for training according to GPU. As a loss function, we use the negative log likelihood loss function, which is frequently used in binary classification problems in neural networks. As an optimizer, we use Adam Optimizer. Finally, we make the batch size 128 and make the learning rate 0.00001 and start the train.

As we explained in last week’s blog, we need supercomputers to classify and model such data, and we can’t get results at the train stage. When we reduced the data, in other words, our study caused us to encounter inadequate compliance due to insufficient data set. So we don’t have good results for now. Let’s see how we solve the underfitting.

Underfitting:

It is the opposite situation that can be deducted when escaping from overfitting. In this case, we cannot capture the important features of our data set and result in the inability to learn. It’s easier to notice because it doesn’t offer deceptive learning like overfitting. It can be solved using more data or using a more complex model.

The attempt to capture this equilibrium is called the Bias-Variance dilemma.

High Bias; the condition is underfitting. Since our model is not complicated enough for the data, it is biased inaccurately. Here, the model does its best, but it does not have enough complexity for the data set.

Solution for Underfitting

Underfitting, that is, we can provide two-handed solutions to the problem of inadequate compliance. We can increase the number of neurons in the layers that we use during learning or add more layers. Another solution would be to extend our learning process a little longer.

As a result, we anticipate that our model will be freed from problems as soon as possible and run successfully with good accuracy.

http://www.onurgoker.com/derin-ogrenmede-fazla-uyum-ve-yetersiz-uyum/

--

--