Our response to COVID-19 — la luz vendrá pronto

Chirag Agarwal
6 min readApr 23, 2020

--

It seemed a usual day of my Ph.D. surviving life — subsiding my caffeine monster, revising papers, reading about the pandemic slowly and slowly conquering the world, and, in disgust, finally delving into my world of black and green bash terminal until one of my Professors came to the lab and asked, as researchers, what can we do to contribute to the society during the ongoing COVID-19 pandemic.

Like many researchers in the community, I started scuffling with my thoughts as to how can I best deploy my skills to do my part for the community. Unlike all the health professionals (who I consider the brave soldiers of the current crisis), I could not go to hospitals and tender the patients affected by the pandemic. I do not have the credentials. Unfortunately, I am not the doctor they want right now. So, I sat down and started doing what I have learned in my doctorate journey. Create solutions to problems using AI. However, we did not want to add another COVID-19 work to the plethora of pre-print papers submitted every day to arXiv and medrXiv.

Image Source: https://en.wikipedia.org/wiki/Coronavirus_disease_2019#/media/File:Total-confirmed-cases-of-covid-19-per-million-people.png

Based on my previous experiences with Medical Imaging, I started looking for publicly available datasets comprising of Chest X-ray scans or CT scans for COVID-19 cases. One of the major challenges for researchers is the lack of COVID-19 datasets. This molds the problem of identifying COVID-19 cases using radiological images into a classical imbalanced dataset problem. There are publicly available datasets at Kaggle for Pneumonia detection using Chest X-ray images or the RSNA Pneumonia Detection Challenge. However, these datasets do not have COVID-19 type pneumonia cases. Hence, many researchers have scrapped COVID-19 images from multiple pre-prints and have made the images publicly available through their GitHub repositories. I would take this moment to personally thank Linda Wang and the whole team of Darwin AI for sharing their research with the whole community. However, the problem still remained. We had an imbalanced data at our disposal.

We started brainstorming and looking at the problem from different directions. Problem specific models play a pivotal role in the design of complex information systems that are common in our age. Conversely, off-the-shelf data-driven approaches do not need explicit mathematical models and have a wider applicability at the cost of interpretability. Although the data-driven approaches can handle large and complex datasets, they are ignorant of the underlying problem-level reasoning. Therefore, it is imperative to develop a hybrid data-driven and domain-knowledge-aware framework to enhance the accuracy and efficiency of deep learning-based COVID-19 diagnosis using CT/X-ray images. Most existing approaches for classifying COVID-19 cases primarily depend on using pre-trained classifier models. One of the main issues with these approaches is that they do not consider the limited dataset of the COVID-19 cases. In addition, these off-the-shelf models are prone to over-fitting issues in a limited dataset regime which is the case for the task of properly detecting the COVID-19 from existing (limited) lung CT/X-ray images. Such an issue arises in deep learning-based models when the network capacity is much larger than the amount of information at hand. Moreover, it should be taken into account that there are small differences between COVID-19 pneumonia and that of other pneumonia cases.

In comes a semi-supervised task-based probabilistic model-aware deep learning architecture for the purpose of Chest X-ray image analysis and distinguishing between the healthy, non-COVID infection type, and COVID-19 to assist the radiologist in triage, analysis, and assessment of cases associated with the disease. A key aspect of our proposed framework is the use of AutoEncoders to learn the latent distributions specific to each class as opposed to using off-the-shelf deep neural networks that may be trained using unrelated datasets. The model-inspired and problem-specific nature of the networks is expected to increase the identification performance, specifically when data is scarce.

The block diagram for CoroNet, our proposed deep learning architecture for semi-supervised task-based identification of COVID-19 from Chest X-ray images. Kindly refer to the paper for detailed descriptions.

Unlike most of the existing works, we propose CoroNet, a novel semi-supervised deep architecture that can distinguish between the three cases of Healthy, non-COVID Pneumonia, COVID-19 infection based on the Chest X-ray manifestation of these classes. The proposed methodology is comprised of two modules: 1) the Task-Based Feature Extraction Network (TFEN), and 2) the COVID-19 Identification Network (CIN). The TFEN module is trained based on a semi-supervised methodology and makes use of two AutoEncoders that allows for automatic segmentation of the infected regions from a latent representation viewpoint. Next, a combination of the residual images using the output of the TFEN module is fed into a classifier to perform a classification task on the underlying data. Overall, the proposed methodology is a task-specific deep model and can be seen as an amalgamation of semi-supervised feature extraction methodology, transfer learning, and supervised classification techniques. Our numerical investigations demonstrate that the proposed model achieves superior performance as compared to the state-of-the-art in terms of accuracy, sensitivity, and predicted positive value for the task of classification of COVID-19 from chest X-ray images when the training data-points are scarce. The proposed methodology obtained an overall class average accuracy of 93.5% The high average precision, recall, and F1-score showed that our model performance resulted in lower false positives and false negatives across the whole dataset.

Do these numbers mean anything?

With the current advancement of AI models, it is increasingly important for the models to make decisions that are interpretable to both researchers and end-users. Specifically, it is essential to diagnose the failure cases of AI models as they are used in making many life-critical decisions. For example, recent self-driving car crash incidents have highlighted the problem of naively trusting decisions from AI models. Using Explainable AI (XAI), not only can we identify these cases but also counteract them.

Many explanation algorithms have been proposed for visually explaining an image classifier’s decisions using an attribution map, i.e., a heatmap that highlights the input pixels that are the evidence for or against the classification outputs. In the figure below, we show the attribution maps of some randomly chosen Chest X-ray images from each category from the testing dataset and show the respective explanation maps generated by our classifier model.

Attribution maps of Healthy, non-COVID Pneumonia, and COVID-19 patients

Interested readers can refer to our pre-print paper for detailed description and evaluation results. We have released all our codes for the Research and Healthcare community for reproducibility.

Thank you again for reading my journey through the ongoing pandemic. I would encourage and appreciate if everyone can help the community in their own way, even if that means staying home. Hopefully, we will see the light soon!

Being a huge comic fan, I would like to end the article with this comic that says a lot.

“If Superman wants to stay home, you know things are serious”

--

--

Chirag Agarwal

Making AI learn what humans couldn’t | Tepidophobic | PhD Survivor