Retinal OCT Images (optical coherence tomography)

6 min readMay 12, 2019

Retinal OCT Images (optical coherence tomography)

84,495 images, 4 categories

www.kaggle.com

In this article i will try to solve a problem. Please go to the link for better understanding of problem statement.

The dataset is organized into 3 folders (train, test, val) and contains subfolders for each image category (NORMAL,CNV,DME,DRUSEN). There are 84,495 X-Ray images (JPEG) and 4 categories (NORMAL,CNV,DME,DRUSEN).

Images are labeled as (disease)-(randomized patient ID)-(image number by this patient) and split into 4 directories: CNV, DME, DRUSEN, and NORMAL.

Our fundamental goal is to classify image as one of the four category. It can be treated as classification problem. It’s a simple and straight forward classification but our fundamental target is how to achieve good accuracy.? Which deep learning model we should approach.? How to implement as we large dataset and it’s impossible to fit everything in memory in one go.

What I Tried?

First I try vgg16 by removing last two layer( transfer learning) but accuracy got stuck in 80% approx.
Trained whole vgg16 but training this too slow as there is lots of parameter, and also accuracy comes only 40% approx(may be due to less data)
Tried resnet101 with transfer learning approach on imagenet dataset weight and i got accuracy around 70–72 % then I trained whole network and bingo I got a lottery, validation accuracy got 96%

Understanding deep learning or machine learning can only be done by lot’s of experiment and reading paper. Until and unless you don’t try you will not able to learn as experience also teaches us lot.

Now , I will give little info about resnet. What is resnet ? Why it works for most of the cases and then I will discuss the implementation of resnet101 in keras also.

Resnet

Res Net is a short name for Residual Network. It means resnet model tries to learn residual function.

Why Resnet? What problem it solve.?

As we go deeper the training of neural network becomes difficult and accuracy gets saturated and sometimes it degrades.Residual Learning tries to solve both these problems.

How deep learning model is designed ?

A single block is repeated for number of times to gain more deeper network.

You can see a single layered block is repeated overtimes, for building a deep network.

What happen in experiment?

Training accuracy of 1 block is better than training accuracy of m block stacked over one another, but ideally training accuracy should remain same or it should increased. This happens mainly due to vanishing gradient or exploding gradient issue. Go to link for better understanding:- “ https://arxiv.org/pdf/1512.03385.pdf”

Here you can see clearly training error is more with 56 layer and less with 20 layer, same hold for test also. But ideally it should decreases as we have added more layer and network is able to understand more.

How Resnet Overcome this problem?

Resnet do short circuiting by making a connection from input to output by skipping some layer as shown below.

Instead of learning a direct mapping , Resnet tries to learn a function

H(x) = F(x) + x. Here + is element wise addition.

If the identity mapping is optimal i.e “ x”, We can easily push the residuals to zero (F(x) = 0) .It is very easy to come up with a solution like F(x) =0. Here F(x) is called Residual function. Here identity mapping is optimal means that specific block needs to be skipped, as that block may be decreasing accuracy or ruining the model. If the specific block needs to skip backpropagation tries to make F(x) =0, so that only X i.e input is passed to next building block and particular block is automatically skipped.

This is very confusing part, people usually don’t understand why skipping is done, please go and read louder to understand the intuition behind the skipping connection.

There are two kinds of residual connections:-

The identity shortcuts (x) can be directly used when the input and output are of the same dimensions.

Residual block function when input and output dimensions are same

2. When the dimensions are different:-

i) we can do zero padding to those which is smaller in dimension.The shortcut still performs identity mapping,

ii)We can use projection to match the dimension by (1*1 conv) using below formula.

Residual block function when the input and output dimensions are not same.

ResNet block is either 2 layer deep (Used in small networks like ResNet 18, 34) or 3 layer deep( ResNet 50, 101, 152).

Let’s solve above problem i.e Retinal OCT Images (optical coherence tomography) using resnet101.

First download the dataset from “https://www.kaggle.com/paultimothymooney/kermany2018/home”

After downloading let’s do some EDA for each class.

As we have imbalanced data, we can use upSampling or downsampling to balanced the dataset. Upsampling can be done by Data Augmentation, i.e generating new image by rotating, scaling etc. I have done downsampling by making the parameter class_weight = “balanced” in keras implementation.

How to load this much data in to main memory?

As we have lots of image that need training we cannot load all the image directly into main memory instead of this we can take load images into chunks. We can do this by using fit_generator function of keras that takes input data_generator as a function that will generate data into desired format and pass into keras module, I have written my own data_generator function as shown below.

After that import resnet101 from keras and please don’t forgot to save the model time to time, as there is huge data, model will take time to train, saving the model weight time to time is good also a good habit.

Loading of .hdf5 file which consist of saved weight of model.

Now, use the fit_generator function and pass the data_generator instance as shown in the image. Don’t forgot to set epoch and batch_size parameter according to your hardware capacity of your pc.

Calling fit_generator function to fit the data.

After running the block your training will start, always monitor the train and validation accuracy, as it gives idea that model is overfitting or underfitting.

Confusion Matrix.

Below images show confusion matrix for training data and test data. As you can see all diagonal value are high.

Confusion matrix on test data.

You can see i got:-

Training accuracy = 0.9760
Test accuracy = 0.99

I have attached all the screenshot from my jupyter notebook.If you want to learn solve this case study by your own as practice is only fundamental key to success.

I hope you have understand what is resnet and it’s basic mathematics.If you like the article please share to friends and don’t forgot to hit clap button.

Retinal OCT Images (optical coherence tomography)