Chest X-Ray COVID-19 Detection using Monk AI

Vedant Khairnar
6 min readJun 24, 2020



COVID-19, an unforgettable pandemic in human history.

Caused by the virus known as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes coronavirus disease 19 (COVID-19).

Developing reliable tests for the virus is essential to slow its spread as lives are at stake.

Most people who get COVID-19 have mild or moderate symptoms like coughing, a fever, and shortness of breath. But some who catch the new coronavirus get severe pneumonia in both lungs. COVID-19 pneumonia is a serious illness that can be deadly.

What Is Pneumonia?

Pneumonia is a lung infection that causes inflammation in the tiny air sacs inside your lungs. They may fill up with so much fluid and pus that it’s hard to breathe.

So, this can be used in determining if someone is infected or not. Lets see how!!

You can refer the colab file here.

Terminologies You should know before understanding the procedure:

A Convolutional Neural Network (ConvNet/CNN)
is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other.
The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.

Transferred Learning
Transferred learning is a research problem in machine learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. For example, knowledge gained while learning to recognize cars could apply when trying to recognize trucks.

Monk AI
It is a Deep Learning and Computer Vision Open Source library to build prototypes, create multiple experiments writing less code without losing framework’s advanced capabilities and features.

What is ResNet

ResNet is a short name for Residual Network. As the name of the network indicates, the new terminology that this network introduces is residual learning.

What is the need for Residual Learning?

Deep convolutional neural networks have led to a series of breakthroughs for image classification. Many other visual recognition tasks have also greatly benefited from very deep models. So, over the years there is a trend to go more deeper, to solve more complex tasks and to also increase /improve the classification/recognition accuracy. But, as we go deeper; the training of neural network becomes difficult and also the accuracy starts saturating and then degrades also. Residual Learning tries to solve both these problems.

What is Residual Learning?

In general, in a deep convolutional neural network, several layers are stacked and are trained to the task at hand. The network learns several low/mid/high level features at the end of its layers. In residual learning, instead of trying to learn some features, we try to learn some residual. Residual can be simply understood as subtraction of feature learned from input of that layer. ResNet does this using shortcut connections (directly connecting input of nth layer to some (n+x)th layer. It has proved that training this form of networks is easier than training simple deep convolutional neural networks and also the problem of degrading accuracy is resolved.

So, ResNet-50 is a deep residual network. The “50” refers to the number of layers it has. It’s a subclass of convolutional neural networks, with ResNet most popularly used for image classification.

The main innovation of ResNet is the skip connection. As you know, without adjustments, deep networks often suffer from vanishing gradients, ie: as the model backpropagates, the gradient gets smaller and smaller. Tiny gradients can make learning intractable.

The skip connection in the diagram below is labeled “identity.” It allows the network to learn the identity function, which allows it pass the the input through the block without passing through the other weight layers!*ByrVJspW-TefwlH7OLxNkg.png

Similarly we have other Networks- ResNet-34 and ResNet-18

This allows you to stack additional layers and build a deeper network, offsetting the vanishing gradient by allowing your network to skip through layers of it feels they are less relevant in training.

Hence what we are making is a Binary classifier

The whole procedure can be divided into the following steps:

  1. A lot many dataset are available on kaggle.
    You can reach the one I preferred here.
    For using it in colab, you need follow a procedure. Refer this for details.
  2. Monk AI Installation can be done using installing their git repository followed by the dependencies and other libraries.
  3. Every AI Framework has some costs and benefits. Monk AI supports Pytorch, Mxnet and Keras. So lets use Mxnet.
  4. So after using Mxnet Gluon backend, we need to create a Project and experiment.
# Create Project and Experiment
gtf = prototype(verbose=1);
gtf.Prototype("Chest-X-Ray-COVID-19", "Gluon-resnet50_v2_3e");

5. Then we need to set the hyperparameters.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned.


It will then provide us a summary of

a) Dataset details
b) Dataset Params
c) Pre-composed Train transforms
d) Pre-composed Val Transforms
d) Dataset Numbers
e) Model Params
f) Model Details
g) Optimiser
h) Learning Rate Scheduler
i) Loss
j) Training Params
k) Display Params

To know how to change the hyperparamters, refer these docs and these examples.

6. Now comes the training

gtf.Train(); # Yes that's it

Following is a gist of summary generated after training.

curr_lr - 0.009604[Epoch 3] Train-acc: 0.903, Train-loss: 0.254 | Val-acc: 0.777778, Val-loss: 0.487, | time: 6.5 secTraining completed in: 0m 19s
Best val Acc: 0.933333
Training End
Training OutputsModel Dir: /content/workspace/Chest-X-Ray-COVID-19/Gluon-resnet50_v2_3e/output/models/Log Dir: /content/workspace/Chest-X-Ray-COVID-19/Gluon-resnet50_v2_3e/output/logs/Final model: final
Best model: best_model
Log 1 - Validation accuracy history log: val_acc_history.npy
Log 2 - Validation loss history log: val_loss_history.npy
Log 3 - Training accuracy history log: train_acc_history.npy
Log 4 - Training loss history log: train_loss_history.npy
Log 5 - Training curve: train_loss_history.npy
Log 6 - Validation curve: train_loss_history.npy
<Figure size 432x288 with 0 Axes>

7. Once the training is done, we proceed with testing the generated model.

To create an experiment for testing, we keep the inference mode on as shown below:

gtf = prototype(verbose=1);

Then we load the dataset into validation, and proceed with validation.

accuracy, class_based_accuracy = gtf.Evaluate();

and once the testing is done, Monk AI itself provides us the accuracy.

To test on a single X-Ray after loading the prototype in an inference mode, the following provides the Output

img_name = "test_image.jpeg"
predictions = gtf.Infer(img_name=img_name);
if predictions['predicted_class'] == 'PNEUMONIA':
print("COVID-19 Positive")
print("COVID-19 Negative")
from IPython.display import Image

Output :

Image name: test_image.jpeg
Predicted class: NORMAL
Predicted score: 5.458780765533447
COVID-19 : Negative

Accuracy after trying the same procedure using different models is as follows:
Made using

So, we can see that high efficiency can be achieved by Transferred Learning with minimal code using Monk AI.

Stay Safe and Happy Reading!!!

Note — I am not from the medical field/biological background and the experiments have been done as a Proof of concept.



Vedant Khairnar

Developer Advocate@Juspay, Organiser@GDGCN, DevScript Founder