[Paper] VGGNet for COVID-19 Detection (Biomedical Image Classification)

Detect Multimodal Imaging Data: Ultrasound, X-Ray & CT Scan

Sik-Ho Tsang
The Startup
Published in
6 min readNov 1, 2020


Left: X-Ray, Middle: CT Scan, Right: Ultrasound

In this story, COVID-19 Detection Through Transfer Learning Using Multimodal Imaging Data (VVGNet for COVID-19), is briefly presented. In this paper:

  • Broncho vascular thickening in the lesion, and traction bronchiectasis are visible during absorption stage, automatic diagnosis is possible.
  • Various classification models are tested, such as VGG16/VGG19, ResNet-50, Inception-v3, Xception, Inception-ResNet-v2, DenseNet, and NASNet-Large, for COVID-19 detection.
  • It is found that VGGNet has the most stable performance across different multimodal datasets including Ultrasound, X-ray and CT scan.
  • The aim is to provide over-stressed medical professionals a second pair of eyes through intelligent deep learning image classication models, providing an automated “second reading’’ to clinicians, assisting in the diagnosis and criticality assessment.

This is a paper in 2020 IEEE ACCESS where IEEE ACCESS is an open access journal with high impact factor of 3.745. (Sik-Ho Tsang @ Medium)


  1. Data Sourcing, Pre-Processing & Augmentation
  2. Various Classification Models
  3. Experimental Results Using VGGNet

1. Data Sourcing, Pre-Processing & Augmentation

1.1. Datasets

  • 4 Datasets are used.
  1. COVID-19 chest X-Rays were obtained from the publicly accessible COVID-19 Image Data Collection [16].
  2. National Institute of Health (NIH) Chest X-Ray [67] dataset for Normal and Pneumonia condition X-Rays.
  3. CT scans for COVID-19 and non COVID-19 were obtained from the publicly accessible COVID-CT Dataset [66].
  4. Ultrasound images for COVID-19, Pneumonia and Normal conditions were obtained from the publicly accessible POCOVID-Net data set [9].
  • (This paper is open access for everyone, please feel free to check the reference for the details of each dataset.)
Different variations observed in the COVID-19 datasets.
  • They all have problems of variable size and quality.
  • Image contrast levels, brightness and subject positioning are all highly variable.

This is because data sourced from an unknown number and variety of X-Ray, CT, and Ultrasound machines, each with variable exposure parameters and operator behaviors.

Data Preprocessing is needed before training.

1.2. Data Preprocessing

Results of enhancement preprocessing on original samples for COVID-19, Pneumonia and Normal images.
  • First, datasets are slightly curated, removing images that were mislabeled projections or dominated by intrusive medical devices.
  • Histogram equalization is applied to images using the N-CLAHE.
  • The method both normalizes images and enhances small details, textures and local contrast by first globally normalizing the image histogram followed by application of Contrast Limited Adaptive Histogram Equalization (CLAHE).
  • This was implemented using the OpenCV equalizeHist and createCLAHE functions.

Subjectively, the authors can no longer easily tell which image has been drawn from which dataset.

1.3. Data Augmentation

Sampled dataset for experiments.
  • Images are then resized to the classier default size, for example 224224 pixels for VGG16/19 and 299×299 pixels for Inception-v3.
  • Data augmentations are applied including horizontal flip, rotation, width shift, and height shift. Vertical flip was not applied since X-Ray images are not vertically symmetrical.

Finally, number of images for training is increased from hundreds to thousands, as shown above.

2. Various Classification Models

Experiment pipeline
  • After pre-processing, resizing, augmentation, we can train the model as shown above.
  • The primary aim is not to perform exhaustive performance evaluation among all available models.
  • Rather the aim is to show the generic applicability of popular model genres for the challenging and limited time critical dataset.
  • Models tested: VGG16/VGG19, ResNet-50, Inception-v3, Xception, Inception-ResNet-v2, DenseNet, and NASNet-Large.
  • 80:20 Train/Test split is used.
  • For more details about the model structures, please feel free to read the paper.

2.1. Head Architecture

Head Architecture
  • Since it is not 1000-class image classification task, rather, it is a binary classification problem to detect whether the patient has got COVID-19 or not according to the input image.
  • Thus, all models’ head architectures are modified so that at the end, there are only 2 neurons for outputting the probabilities of non COVID-19 and COVID-19.

2.2. F1-Measure

Model performance summary.
  • Without sophisticated hyperparameter tuning, the simpler VGGNet classifiers were more trainable on all three image modes and provided more consistent results across all three image modes.
  • The more complex models tended to either overt in early epochs (<10) or failed to converge at all. More complex model trainability was highly dependent upon initial model hyperparameter choice.

3. Experimental Results Using VGGNet

Datasets used for experiments.
  • Five sets of experiment are tried, as shown above.
Experiment results for three image modes.
  • For experiments classifying COVID-19 and Pneumonia vs Normal (1A and 2A), it is found that the Ultrasound mode provided the best results with a sensitivity of 97% and positive predictive value of 99% compared to X-Ray with 83% and 85% respectively.
  • For experiments classifying COVID-19 vs Pneumonia (1B and 2B), it is again found that the Ultrasound mode provided the best results with a sensitivity of 100% and a positive predictive value of 100% compared to X-Ray with sensitivity of 86% and positive predictive value of 86%.
  • The CT imaging mode was found to have a sensitivity of 83% and positive predictive value of 79% in classifying COVID-19 vs non COVID-19 scans.
  • All experiments resulted in F1 scores exceeding 80% which is a good result given the relatively small and variable quality data corpus available.
  • Since there is still false negative, authors try to tune the threshold of 0.5 to 0.75 to reduce the false negative number with the price of increasing the false positive number. For details, please feel free to read the paper.



Sik-Ho Tsang
The Startup

PhD, Researcher. I share what I learn. :) Linktree: https://linktr.ee/shtsang for Twitter, LinkedIn, etc.