GaNDLF: A PyTorch Framework for Rapid & Reproducible Research in Healthcare

Sarthak Pati
PyTorch
Published in
6 min readJul 15, 2021

Authors: Sarthak Pati, Siddhesh P. Thakur, Spyridon Bakas

Center for Biomedical Image Computing and Analytics (CBICA), University of Pennsylvania

Schematic representation capturing the categorization of various software packages in the open-source community

Deep Learning (DL) has greatly highlighted the potential impact of optimized machine learning in both the scientific and clinical communities, and the advent of open-source DL libraries further contributes to DL promises on the democratization of computational analytics in healthcare. However, increased technical and specialized background is required to develop DL algorithms, and the variability of implementation details hinders their reproducibility.

Taking this into consideration, to lower the barrier and to make the mechanism for DL development, training, and inference more stable, reproducible, and scalable, without requiring an extensive technical background, we introduce the Generally Nuanced Deep Learning Framework (GaNDLF). With built-in support for k-fold cross-validation, data augmentation, multiple modalities and output classes, multi-GPU training, and the ability to work with both radiographic and histologic imaging, GaNDLF aims to provide an end-to-end solution for DL-related tasks in medical imaging and provide a robust application framework for deployment in clinical workflows. To ensure greatest applicability, the target audience for GaNDLF is two-fold: i) the researcher who would like to generate baseline results without doing any coding via a rich, highly adaptable, and easy-to-use interface, and ii) the researcher, who prefers to go deeper in a more involved development, for whom GaNDLF provides a fully-formed DL pipeline that can be used around any computational technique they develop (a new network topology, loss function, optimizer, and so on).

With our end goal being a publicly available open-source framework, which has an easy-to-use interface (i.e., user-focused) and end-to-end reproducibility, GaNDLF makes DL-based analytics more accessible to researchers who aren’t extremely familiar with writing DL-based pipelines. This would democratize the use of DL-based advanced computational workloads in the field of computational medicine. To ensure that GaNDLF has maximal flexibility for the diverse types of datasets, we have focused on medical imaging data, and we have built-in support for radiologic and histologic datasets, with architectural flexibility to expand into non-imaging datasets (such as genomic, and electronic health records (EHR)).

Overall Workflow and Salient Features

Flowchart depicting GaNDLF’s overall training/inference processes.

The complete training/inference workflow of a computational workload using GaNDLF is illustrated in the figure above, which shows two major points of entry for the user: i) a list of datasets that will be used for training, and ii) the configuration of the training/inference pipeline. The configuration contains all the options that the user wants to enable while performing the training, and once a model has been trained, the same configuration, combined with the stored model weights, can be used for inference. The major components of GaNDLF’s architecture are described in the following sections.

Pre-processing

Providing robust pre-processing techniques, widely applicable to medical imaging data, is critical for such a general-purpose framework to succeed. GaNDLF offers a lot of the pre-processing techniques reported in the literature, leveraging the capabilities of basic standardized pre-processing routines from the Insight Toolkit (ITK), and advanced pre-processing functionality from the Cancer Imaging Phenomics Toolkit (CaPTk). The pre-processing steps for data curation (including harmonization and normalization) are as follows:

  1. Data harmonization: This ensures harmonization in either the voxel/pixel resolution (which defines the physical definition of the data) or across the image resolution (which defines the overall extent of the data in the image space).
  2. Intensity normalization: This ensures that the intensities of the dataset are always defined in the same space. For example, magnetic resonance (MR) images can have different intensities based on the scanning parameters, and appropriate intensity harmonization (e.g., histogram normalization, or WhiteStripe) is required before performing any computational analyses.

Data Augmentation

It is widely accepted that DL-based methods are extremely data hungry, and medical data is scarce because of various technical, privacy, cultural/ownership concerns, as well as data protection regulatory requirements, such as those set by the Health Insurance Portability and Accountability Act (HIPAA) of the United States and the European General Data Protection Regulation (GDPR). To tackle this, GaNDLF leverages existing robust data augmentation packages, namely TorchIO and Albumentations. They not only increase the size and variability of the training data, but also enable more generalizable/robust model training.

Cross-validation

k-fold cross validation is a useful technique in ML that ensures reporting unbiased performance estimates, and helps capture information from an entire given dataset, by training k different models on corresponding folds of the complete training data. By enabling this method, GaNDLF ensures that trained models never overfit to a specific data cohort, and the resultant evaluation metrics are representative of the entire dataset.

Software Stack

Illustration of GaNDLF’s software stack.

The software stack of GaNDLF, as illustrated in the figure above, is interconnected between the lower-level libraries and the more abstract functionalities exposed to the user via the command line interface. This ensures that a researcher can perform DL training and/or inference without having to write a single line of code. Furthermore, the flexibility of the stack is demonstrated by the ease with which a new component (e.g., a pre/post processing step, network architecture, loss function, scheduler) can be incorporated into the framework, and subsequently applied to new types of data/applications with minimal effort. Finally, the software stack flexibility also allows developers to use existing infrastructure components of GaNDLF in their own applications, as shown in the Federated Tumor Segmentation (FeTS) framework.

Results and Looking Ahead

Results for Semantic Segmentation for some anatomies. The Organ describes the organ system of the data, Application describes the use case for the trained model(s), Dimensions describe the dimensionality for each input modality, Input Modalities describes the total number of input modalities for the model to train on, Output Classes shows the number of classes the model should be predicting, Architecture describes the network topology and Dice describes the overall testing Dice similarity coefficient for the specific model.

GaNDLF was applied completely out-of-the-box (i.e., without any parameterization or tuning) to various anatomies, applications, imaging modalities, and output classes (i.e., multi-class vs binary segmentations) to generate a holistic set of results across real clinical datasets. Some of these results, utilizing some of the numerous DL network topologies incorporated in GaNDLF, are showcased in the above table. These indicate the promise for GaNDLF as a self-contained DL framework, with various abstraction layers to enable researchers to produce and contribute robust DL models with absolutely zero knowledge of DL or coding experience.

Currently GaNDLF is being further applied on multiple regression, classification, and synthesis tasks, to further solidify its status as a generalizable framework applicable to various AI workloads. Although GaNDLF offers a mechanism for end-to-end DL training and inference, the rapid developments in this domain yield multiple considerations for future directions. A mechanism ensuring cascading of models (i.e., train/infer different models of same/different architectures sequentially) or aggregation (i.e., train/infer models of different architectures concurrently) is not present, which have generally shown to produce superior results. Techniques residing across AutoML and network architecture search (NAS) have shown a lot of promise for creating robust models, but are currently not supported in GaNDLF. Finally, application of GaNDLF to data types beyond imaging, such as genomics or EHR, has not been explored yet but is considered as current work in progress.

In conclusion, GaNDLF is being developed by an international partnership across academia and industry, and pledges to always welcome additional collaborations and individual contributions towards open science. The ultimate goal for GaNDLF as a DL framework is transparent reporting of results and reproducible research studies, in both the domain of radiology and histopathology, paving the way towards big data analyses in medicine, enabling us to optimally contribute to the space of personalized diagnostics.

--

--

Sarthak Pati
PyTorch
Writer for

Machine Learning researcher focusing on Federated Learning.