Diabetic retinopathy detection using CNN architectures and augmented data sets

8 min readMay 15, 2023

The retinal fundus photographs are used to detect and document abnormalities of disease process affecting the eye, and/or to follow up on the progress of the eye condition/disease such as diabetes, age-macular degeneration (AMD), glaucoma, multiple sclerosis, etc. This article presents a machine learning (ML) based methodology to detect eye conditions from fundus camera images. The ML techniques are applied not just on individual data sets but also on combined data sets to examine the effect of image augmentation having images from different data sources. The images data set in this article is used from three different sources and number of experiment runs have been carried out and were tracked to register the results for post evaluation. First Image pre-processing has been carried out on the three different image sets such as Crop zoom, Horizontal flips, Contrast enhancement, etc. Then multiple variations of CNN models such as a simple CNN architecture, ResNet-18, Xception were trained on these image sets.

This work aims to use fundus image classification to automatically diagnose DR and classify fundus images in to ‘normal’ and ‘DR’ categories. Along with that, it also aims to classify other diseases which may not be of the type DR such as Glaucoma, AMD, etc. in the ‘other’ category. A novel approach has also been developed to train the models not only on individual image sets but also on a combination of all three datasets mainly to depict and try the Image augmentation in an organic way.

Refer the above infographic for a high level overview about the data sets being used in this work.

Image pre-processing and augmentation:

Given the fact that all the three data sets are from ‘primary research’ and collected directly from the patients, Image pre-processing is an important operation.

The pre-processing steps included resizing the images and then cropping them. To augment the images, mainly random horizontal flip, random contrast enhancement and random zoom were used. Apart from these steps as part of number of experiments runs many other pre-processing steps were carried out for trial.

The following figure presents the different image pre-processing & augmentation methods used. For all these tasks tf.keras.layers API have been used.

For example refer the following code snippet used for center crop.

# Create a image pre-prcessing layer with center crop

pre_processing = keras.Sequential(
    [
        layers.CenterCrop(180,180),
    ]
)


plt.figure(figsize=(20, 20))
for images, _ in train_ds.take(9):
    preprcessed_images = pre_processing(images)
    plt.subplot(2,2,1)
    plt.title("Original")
    plt.imshow(images[0].numpy().astype("uint8"))
    plt.axis("off")
    plt.subplot(2, 2, 2)
    plt.title("Cropped")
    plt.imshow(preprcessed_images[0].numpy().astype("uint8"))
    plt.axis("off")

Models:

In this study, Convolutional Neural Networks (CNN) are used to classify the fundus camera images. A CNN is a type of deep learning neural network that is widely used for analysing and processing image and video data. CNNs are especially effective at recognizing patterns and features in visual data, such as edges, textures, and shapes.

The key characteristic of CNNs is that they use convolutional layers to scan an input image or video frame and extract features that are relevant for a given task, such as object recognition or image classification. These convolutional layers typically use small, learnable filters or kernels that slide over the input data and perform element-wise multiplication and addition operations to compute a set of output values.

In addition to convolutional layers, CNNs may also include pooling layers, which down sample the output of the convolutional layers by taking the maximum or average value of small regions of the input. This helps to reduce the spatial size of the feature maps and make the network more computationally efficient.

CNNs may also include fully connected layers that take the output of the convolutional and pooling layers and map it to a set of output classes or predictions. The parameters of the network, including the filters in the convolutional layers and the weights in the fully connected layers, are learned through a process of backpropagation and gradient descent, in which the network is trained on a set of labelled training data. In this work, the following three CNN architectures are used to classify the images in to three categories (DR, normal and others).

Experimentation

For each of the three data sets, multiple experiment runs were carried out to examine the effect of augmentation, hyper parameter optimization and other settings.
Eventually, experiments were carried out on a combined data set as well.
All the experiments were tracked using MLFlow Tracking.
Refer the flow diagram for a simple illustration of the overall training process.

For each set of experiments the following were the key observations:

Experiments on combined data set

Hypothesis:

“ A combination of retinal fundus camera images from different sources can be used as an optimal way to get an augmented data set which can be used by deep learning based neural networks and yet get satisfactory results.”

Methodology:

A subset of all the three data sets as mentioned in section [4] is taken and combined to form a data set with images from all the three data sources.
The distribution of samples belonging to the three classes is maintained while combining the data sets.
Subsets were used to reduce the overall number of images for optimized compute and fast experimentation.
Then more than 15 experiment runs were carried out using permutations and combinations of different settings.
Each experiment run is logged and tracked using MLFlow tracking.Post analysis was carried out to further evaluate the results.

Refer the following table for the range of values used for configurable attributes while performing the experiments on the combined data set.

Refer the following infographics for some of the key findings for experiments on combined datasets:

Experiments on individual data sets:

Hypothesis:

“ Diabetic retinopathy can be detected and classified from retinal fundus camera images using deep learning based neural networks.”

The following tables presents some of the experiment runs and results carried out on the different data sets:

Observations & Summary

From the experiments carried out, following are key observations:

Deep learning-based CNN architectures produced satisfactory results while classifying the retinal fundus camera images in to DR and normal categories. This is evident from all the experiments.
From the three datasets that are used in this study, CNN models performed worst on ODIR dataset and were equally good on SuSTech and EyePACS with more than accuracy of ~95% in each case.
Moreover, in case of combined data sets too, a combination of SuSTech and EyePACS produced much better results (train accuracy : 97%, val accuracy : 95% and test accuracy: 92%) as compared to a combination of all the three datasets (excluding ‘Other’ class) (train accuracy : 83%, val accuracy : 81% and test accuracy: 80.4%).

Out of the three CNN architectures examined in this report, Xception came out to be the best performer with train accuracy of ~83% and test accuracy of ~80% for all the three classes from a combined data set.
With Xception, in all the cases model overfitted in most of the cases however with increase in dropout from 0.2 to 0.45, the issue of overfitting resolved as is evident in the experiment.
From the experiments carried out by combining the data sets, it is proved that images from different data sources can act as an organic way to augment the images and yet achieve satisfactory results.
Interestingly, even though the best model i.e. Xception didn’t performed well on ODIR data set alone with train accuracy ~72% and validation accuracy 70%, however when it is augmented with the other two data sets, the performance of same model increases with train accuracy as 83.6%, validation accuracy as 81% and test accuracy as ~80%.

From this study, we can conclude that deep learning-based CNN architectures can detect the eye impairments from retinal fundus camera images with satisfactory results. This can aid doctors, ophthalmologists a lot in early detection of diseases such as Diabetic Retinopathy. Further, we witnessed the effect of combining the data sets from three different sources. Even though all the three data sources provided retinal fundus camera images, but of different quality, captured from different camera types, in different environment and from different institutions. Thus a combination of these data sets proved to be an effective way to yield a model that is even more effective in identifying DR and other diseases using CNN capabilities.

References

https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment
Saad M Khan, Xiaoxuan Liu, Siddharth Nath et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. Lancet Digit Health 2021;3: e51–66 .
Li Lin, Meng Li, Yijin Huang, The SUSTech-SYSU dataset for automated exudate detection and diabetic retinopathy grading. Scientific Data (2020)7:409 .
Peking University International Competition on Ocular Disease Intelligent Recognition (ODIR-2019) (https://odir2019.grand-challenge.org/dataset/)
Asia, A.-O.; Zhu, C.-Z.; Althubiti, S.A.; Al-Alimi, D.; Xiao Y.-L.; Ouyang, P.-B.; Al-Qaness, M.A.A. Detection of Diabetic Retinopathy in Retinal Fundus Images Using CNN Classification Models. Electronics 2022, 11, 2740.
Grzegorz Meller. Ocular Disease Recognition Using Convolutional Neural Networks. Towards Data Science.
Roc Reguant; Søren Brunak & Sajib Saha Understanding inherent image features in CNN‑based assessment of diabetic retinopathy. Scientific Reports.
Li Lin, Meng Li, Yijin Huang, The SUSTech-SYSU dataset for automated exudate detection and diabetic retinopathy grading. Scientific Data (2020)7:409
https://www.kaggle.com/c/diabetic-retinopathy-detection/data
https://mlflow.org/docs/latest/tracking.html
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv
Franc¸ois Chollet , Google Inc. Xception: Deep Learning with Depthwise Separable Convolutions Franc¸ois Chollet, arXiv