Pandemics don’t wait: How we built COVID-Net in under 7 days
“Imagination is the highest form of Research” — Albert Einstein
On March 22nd our team announced the availability of COVID-Net (latest updates here), a neural network for COVID-19 detection using chest radiography (X-Rays). Our decision to open source the model, dataset and project source code, attracted an overwhelming response from researchers around the world and a significant amount of media coverage (here, here and here).
From the outset, we were clear that COVID-Net was intended as a complementary tool to assist clinicians in rapidly screening for the virus and not a replacement for the definitive but slower and more complex PCR test. One of the largest bottlenecks in triage and diagnosis is the need for experts to interpret radiography images.
Computer-aided diagnostic systems accordingly have the potential to save lives and more efficiently direct scarce medical resources. (To that end, we are co-authors of a just-released study of COVID-Net neural networks for COVID-19 lung disease severity assessment; we are deeply indebted to the good folks at NVidia for facilitating access to a powerful DGX2 that accelerated this analysis by an order of magnitude).
In this context, COVID-Net has been leveraged by researchers in Italy, Canada, Spain, Malaysia, the U.S and India, and a number of groups have extended our initial effort in impressive ways.
Yet beyond standard inquiries of how individuals could contribute to the project and leverage it responsibly, one question surfaced more than the others: how did we build such a high-performing and purpose-specific network so quickly?
As deep learning practitioners will attest, constructing a working neural network often requires — per Einstein’s declaration — imagination and creative intuition. It’s a laborious undertaking in which instinct is coupled with long trial-and-error cycles before something workable emerges. Yet as a wise writer said, creativity takes time and ideas simmer before they boil.
How, then, did we accelerate the development process behind the creation of COVID-Net?
Pandemics don’t wait
Given the urgency of the present predicament, we endeavored to build COVID-Net at an accelerated pace — an order of magnitude faster than typical deep learning initiatives that can take several months to implement even for large enterprises.
Such rapidity is especially important for fluid and fast-changing scenarios such as COVID-19 where data based on new cases emerges continuously and DNN architectures must be tailored to an ever-evolving knowledgebase. As such, we set a goal of ‘less than 7 days’ to develop and release COVID-Net to the global research community in hopes of quickly developing an important tool in combating the pandemic.
While similar efforts were underway in organizations throughout the world, they appeared to be private initiatives involving significantly larger teams with access to greater resources. This ZDNet article, for example, highlights a project at the Institute for Interdisciplinary Information Sciences at Tsinghua University in Beijing that involves a group of over 30 researchers.
By contrast, our team initially consisted of two individuals: Professor Alexander Wong, our Chief Scientist, and Linda Wang, one of our research students, who leveraged our platform to develop COVID-Net from scratch in under a week. An interesting byproduct of the endeavor is that it illustrated what could be achieved through breakthrough Explainable Artificial Intelligence (XAI) technology. Specifically, the team demonstrated that explainability:
- Accelerates scaleable development beyond what humans and AI can accomplish independently
- Provides greater transparency into model design and performance — for more trustworthy deep learning development and regulatory compliance
In pursuit of these objectives, the pair employed a human-machine collaborative design strategy in which they combined human-driven network design prototyping with machine-driven design exploration over five steps:
- Data collection
- Principled network design prototyping
- Machine-driven design exploration
- Validation via explainability
- Explaining and Understanding architecture
Rather than simply treating AI as a tool to be leveraged, this approach reimagines AI as a collaborator that learns from a developer’s needs and subsequently proposes multiple design approaches with different trade-offs in order to enable a rapid and iterative approach to model building.
In the following sections we outline the tasks our team undertook in the context of this strategy.
Step 1: Data collection
To begin the process we constructed a dataset, termed COVIDx, using publicly available sources and images from our collaborators. As of this writing, the dataset consists of 13,975 CXR images across 13,870 patient cases and continues to grow each day. (The generation scripts for constructing COVIDx can be found at our GitHub repository).
The COVIDx dataset consists of thousands of CXR images from public sources
While COVIDx has grown significantly since inception, it nevertheless exhibits data imbalances owing to the rarity of positive COVID-19 cases relative to other respiratory afflictions.
This problem is not uncommon, so model designers must remain vigilant and take steps to reduce impact of such biases.
Step 2: Principled network design prototyping
The first stage of the human-machine collaborative design strategy is principled network design prototyping, in which we constructed a prototype based on human-driven design principles and best practices. Essentially, the prototype provides the initial scaffolding of the model while leaving final microarchitecture and micro-architecture decisions to the machine-driven aspect of the process.
As our starting point, we leveraged residual architecture design principles as they enable reliable neural network architectures which are easy to train to high performance and which enable the construction of deeper architectures.
To help clinicians better triage as well as decide on the treatment strategies to employ, we designed our prototype to make one of three predictions (three-class softmax output):
- No infection (normal)
- Non-COVID-19 infection (e.g., non-COVID19 viral, bacterial, etc.)
- COVID-19 viral infection
From this point, the initial prototype needed to be converted into full-fledged deep neural networks.
Step 3: Machine-driven design exploration with Generative Synthesis
The second stage of our design strategy is machine-driven design exploration using our GenSynth platform (see here for an overview and case study of GenSynth and here for a comparison with Network Architecture Search).
Rather than relying on a brute-force approach and exploring ad-hoc permutations, we instead combine GenSynth with our understanding of the domain requirements to promote a systematic and intelligent approach to design exploration.
Specifically, given data and human-defined requirements, the platform guides a design exploration that learns and identifies the optimal macro-and micro-architectures with which the final model can be constructed. Such a machine-driven approach enables greater flexibility than its human-driven counterpart while also ensuring the resulting network satisfies operational constraints.
DarwinAI’s GenSynth platform makes it easy for designers to explore and generate custom deep learning models tailored for data and task at hand
For COVID-Net, our operational requirements included greater than 80 percent sensitivity and positive predictive value (i.e., probability that a positive prediction is in fact correct). These parameters were chosen to enable the platform to strike an appropriate balance between accuracy and speed as a key consideration was designing a network that could run on different platforms — be it in the cloud or on an edge device (perhaps even the actual imaging device itself).
Using this information, GenSynth generated a number of different ‘ready-to-go’ models that met our design requirements subject to different characteristics and trade-offs.
From there, we were able to develop a detailed understanding of the design choices made by the platform, which in turn helped guide us as we explored and refined our model. Specifically, GenSynth not only helped us design new models, but identify key performance bottlenecks giving us greater transparency into the overall make-up and performance of the network itself. Such human-machine collaboration enables the creation of unique, tailored designs with different and — most importantly — understood trade-offs.
The GenSynth platform enabled informed design choices within generated models; this image shows different layers within different generated models, with red highlighting performance bottlenecks
Step 4: Validation via Explainability
When building a precise and robust neural network it is important to recognize that output alone is insufficient to communicate the model’s strengths and weaknesses — and, most notably, if it is producing the right results for the right reasons. The opaque nature of deep learning, which is being increasingly scrutinized as AI becomes pervasive, is akin to trying to debug a classical computer program without the source code (our XAI primer provides a thorough overview of the topic). In healthcare, this level of opaqueness makes it difficult not only to design networks, but also to gain widespread adoption with clinicians.
The ‘black box’ problem that results from this limitation is a key reason why design audits are frequently omitted from DL workflows, as the alternatives to XAI-based assessments are cumbersome, time-consuming, and often involve scripts, interpretations and considerable manual effort. Moreover, they aren’t especially effective, most notably for unusual and non-intuitive cases, such as the example we’ll illustrate in a moment.
However, investing the time to audit your model can dramatically accelerate and simplify development: identifying the gaps in the design and the underlying factors behind them greatly increases your ability to design effectively and can avoid considerable pain and debugging down the road. Moreover, the insights gained through explainability can not only be used to generate better networks, but can also illustrate why they reach particular conclusions.
In the case of COVID-Net, using GenSynth to audit our design allowed us to:
- Identify error scenarios
- Understand the reliability of our model
- Gain valuable insights into how to improve our model
All of these lessons contributed to building trust with users and ensuring long-term design efficacy.
A confusion matrix for COVID-Net generated by GenSynth
Starting from the confusion matrix above, the platform automatically groups different error scenarios so as to provide a quick high-level picture of how the network is performing. GenSynth further allows us to drill into the different error scenarios to pinpoint specific biases and gaps, as well as understand the critical factors behind model decisions as illustrated in the images below.
Example CXR images of COVID-19 cases from several different patients and their associated critical factors (highlighted in red) as identified by GenSynth
In addition to employing XAI for responsible and transparent design, the ability to interpret and obtain insight into how COVID-Net identifies viral infections is important for:
- Increasing trust: By identifying the critical factors in the decision-making process, the predictions made by COVID-Net are made more transparent and trustworthy to clinicians, which can assist them in making faster yet more accurate assessments.
- Revealing new insights: Highlighting critical factors can help clinicians unearth new insights into the key visual indicators behind COVID-19 viral infection, which they can appropriate into their own diagnostic processes to further improve screening accuracy
Example: When models make the right decision for the wrong reasons
Ensuring that a neural network is making the right decisions for the right reasons is an essential part of designing robust models for real world applications. Our XAI technology provides unparalleled insights to this end, illuminating the critical factors in the decision-making process so designers can identify and remove false cues from the model.
Thus far, we’ve examined COVID-Net’s analysis against X-rays, but our team has been hard at work on COVID-Net-CT, which performs similar diagnostics using CT scans. In the course of auditing one of our earlier model designs we encountered an issue that would have been very difficult to identify in absence of XAI capabilities.
The figure below depicts 15 CT scans in which the model correctly detected COVID-19 infections. By highlighting the critical factors that led to the detections GenSynth revealed a startling phenomenon: the positive readings in this case weren’t not based on anomalies in the patients’ lungs, but rather the appearance of the bed of the CT scanner itself.
Identifying this false cue was crucial in improving the model; in this case, it compelled us to revisit how we created and processed our data and how we trained our models.
In these examples, GenSynth revealed that the critical decision factor (lighter gray areas) responsible for correct COVID-19 diagnoses based upon CT scans was the appearance of the bed of the CT scanner
Given the data-driven nature of deep learning, ‘right decision for the wrong reason’ scenarios are not uncommon, and can be extremely difficult to track and identify without an XAI-driven auditing strategy. As such, the value of explainability in improving the reliability of deep neural networks for clinical application cannot be understated.
Step 5: Explaining and understanding the COVID-Net architecture
By fusing human domain knowledge with the capabilities of GenSynth, our team produced a unique and effective model in under seven days. Compared to the popular ResNet-50, COVID-Net had less than half the computational complexity while having 8% higher COVID-19 sensitivity for the task of COVID-19 detection.
COVID-Net exhibits an efficient micro-architecture design largely composed largely of 1x1 convolutional layers and depth-wise convolution layers. The heavy use of a projection-expansion-projection (PEPX) design pattern facilitates formidable performance efficiencies, while still maintaining strong COVID-19 sensitivity and PPV.
The model also includes selective long-range connectivity, which is fairly unusual as residual networks usually exhibit short-range connectivity. This feature is a result of a fundamental trade-off between performance and memory footprint. In the case of COVID-Net, being selective about employing long-range connectivity only where necessary minimizes the overall footprint in accordance with our human-directed operational parameters.
COVID-Net, in all its glory–the design employs a diverse collection of architectural traits that result in a high-performance model purpose-built for making accurate COVID-19 detections based upon chest X-ray images
As the construction of COVID-Net illustrates, explainable AI technologies are of tremendous benefit when designing neural networks. Specifically:
- Accelerating scalable development beyond what humans and AI can accomplish independently
- Providing greater transparency into model design and performance for more trustworthy deep learning development and regulatory compliance
Since releasing COVID-Net and delivering a webinar on the subject, we’ve benefitted from the support of the wider medical and AI community, including a significant increase in the number of X-ray images comprising the COVIDx dataset.
Going forward, we will continue to iterate on model improvements so as to:
- Build upon COVID-Net to improve it as more data arrives
- Design new neural network architectures for better sensitivity and PPV
- Design new strategies for training COVID-Net to higher performance
As mentioned, our team is also working on COVID-Net-CT, a neural network tailored for COVID-19 detection via CT scans (we are likewise working on augmenting the COVIDx dataset with CT instances). For context, CTs provide superior levels of detail and clarity when compared to conventional X-Rays, and several studies suggest higher levels of sensitivity than viral testing.
Finally, our team is also hard at work on COVID-Net-Risk, a neural network tailored for COVID-19 risk stratification. The goal with COVID-Net-Risk is to provide greater insights into risk level and severity to assist caregivers with triage and treatment plans.
This article only scratches the surface of what is possible using the GenSynth platform.
Since the beginning, the goal at DarwinAI has been to unlock the potential of AI by accelerating deep learning through our unique technology. The construction of COVID-Net illustrates the promise of this approach.
Is there a deep learning project your team is looking to accelerate? If so, contact us and learn how we can help.
Likewise, questions about COVID-Net, contributing to the project, or GenSynth more generally can be directed here.
Stay tuned, the best is yet to come.