Addressing Data Shortages in the Medical Sector: Data Synthesis and Heterogeneous Transfer Learning

Published in

The Startup

6 min readJun 27, 2020

Introduction

One of the biggest problems in the field of medical image segmentation is acquiring an adequate amount of data on a variety of tasks necessary to deploy effective deep learning models. Labeled data requires a cumbersome manual annotation process that generally may only be performed by persons of considerable expertise. Indeed, the subtlety of medical imaging makes this a job for expert radiologists.

However, new techniques have recently developed to circumvent such issues in recent years, namely the use of data synthesis and heterogeneous transfer learning techniques.

Data Synthesis

The technique of synthesizing data is a form of data augmentation, albeit an remarkably sophisticated way to perform data augmentation. The process involves synthesizing a labeled dataset from just a single labeled training example and other unlabeled training examples.

The performance of conducting data synthesis is impressive- it greatly exceeds that of one-shot methods. However, its performance still considerably worse than that of manually-labeled, “ground-truth” training sets. Even with its inferior performance to ground-truth data, Data synthesis and other procedures nevertheless have their place.

The Problem

Supervised deep learning models remain the state-of-the-art for semantic segmentation problems. However, such models require a great deal of manually labeled images for its dataset. Further, the problem of acquiring labeled datasets is only exacerbated by the differences in image intensity and other noise arising due to differences between the machines used to gather such images.

As a result, many clinical image datasets remain too small for supervised models. Thus, as a means to address this issue, the technique of synthesizing labeled datasets developed. This would help to overcome the data shortages in the medical fields, particularly in those cases when the professionals are facing time and labor constraints.

Other Attempts at Data Augmentation

The differences between medical images are subtle, with the main point of difference lying in the varying degrees of image intensity and tissue appearance.

Data augmentation techniques have attempted to imitate such differences, most often by means of hand-tuned augmentation and random augmentation functions (i.e. flipping or warping. However, such crude techniques have proven to be largely ineffectual in simulating differences between medical images. Thus, a need for a more sophisticated data augmentation procedure arose.

The Procedure of Data Synthesis via Learned Transformations

The data synthesis technique uses a learned spatial transform model, a learned appearance transform model, at least one atlas, and many unlabeled images to generate new labeled training and validation data that captures the anatomical and imaging diversity in the unlabeled images.

Let’s untangle that:

A spatial transform model captures the anatomical differences among subjects. It uses the U-Net architecture.
An appearance transform model accounts for intensity variations among the images. It also uses the U-Net architecture.
An atlas, also known as a labeled reference volume, is the x-value of a labeled training example.

The procedure is illustrated as follows:

Procedure of data synthesis via learned transforms, as outlined below. [2]

To train the spatial transform model, feed an unlabeled image and an atlas (x-value of labeled example) into a spatial transform model that outputs an image. The output is evaluated by a type of image similarity loss to the unlabeled input, and the network is optimized.
Similar to (1), to train the appearance transform model, input the atlas and a warped unlabeled image. Then, overlay the output appearance transformation (representing image intensity) to the atlas. Optimize this with a loss function containing an image similarity loss between the final output and the warped unlabeled image and a regularization term that penalizes dramatic intensity changes.
Apply the transform models from (1) and (2) to generate a sampled spatial transformation and a sampled appearance transformation from unlabeled images. The models perform the best when the unlabeled images (the spatial and appearance targets) are from different images (i.e. mixing-and-matching).
Apply those sampled transformations to the atlas and its label to synthesize a labeled training example. The appearance transform is only applied to the atlas so that the new image may contain differences in intensity, but not on the new label, since the label is just a segmentation indicating location(intensity is not applicable).

After iterating this process, the data set is produced and then used to train and evaluate a supervised deep learning model that outputs a robust segmentation of an input image.

Results

Useful figure from the data synthesis paper. Rand-aug and SAS are alternative data augmentation techniques. [2]

Using a plain supervised model as allowed by this novel approach in synthesizing data outperforms one-shot segmentation methods as well as other supervised models using other data augmentation techniques. In addition, pairing the data synthesis method with hand-tuned random data augmentation (rand-aug) improves the performance further.

However, as illustrated in the table below, even though this method is a significant improvement in data augmentation techniques on segmentation tasks, it is no replacement for a fully-labeled data set (upper bound).

The performance of data synthesis method (ours-indep and ours-coupled) relative to other data augmentation techniques. [2]

Heterogeneous transfer learning

The focus thus shifts to techniques whereby we can reduce the size of the data set needed. The best technique for this is perhaps transfer learning, that ever-important lifeblood of any decentralized deep learning project.

However, transfer learning still requires a great deal of data to train the base models. This is especially the case in the medical sector, in particular the domain of 3D medical segmentation, due to the laboriousness of manual annotation. As a result, there are few substantial datasets on a given task.

The predominant approaches to 3D segmentation was therefore to train from scratch or do transfer learning from the unrelated Kinetics data set (basically ImageNet for video footage).

Medical imaging proves to be more subtle than computer vision recognition tasks, though. For instance, slight differences in tissue appearance (i.e. a slightly irregular tumor) can spell the difference between a mild or aggressive tumor.

Thus, lacking a single substantial source of data, the authors of the Med3D paper compiled datasets from diverse organs and pathologies to create a heterogeneous 3D model by utilizing special normalization techniques. As a result, the performance of networks that fine-tune this Med3D model proves to perform decidedly better than other procedures.

Performance in segmentation and classification tasks of a model pre-trained on Med3D versus training from scratch (TFS) and pre-training on the Kinetics dataset (Kin). [3]

Conclusion

This post outlined the problem of data shortages in the medical fields and the prominent techniques attempting to side-step or reduce that need to accumulate massive datasets.

The data synthesis via learned transforms method plays an important role if lacking data in cases where time constraints and labor shortages are key issues. Data synthesis therefore remains an alternative to labeled, ground-truth datasets, whether being introduced directly or via transfer learning.

However, to those with the resources to accumulate a sizable dataset to directly train a supervised model, transfer learning plays an indispensable role. It is best that the base models used for transfer learning are directly applicable to the task at hand, but even a heterogenous model such as Med3D can play a key role in reducing amount of data needed and improving performance.

It would be interesting to see the results of combining these two disparate techniques — a model designed for transfer learning that trains on a synthesized dataset — but until then, data synthesis and transfer learning techniques are the key means to address problems in acquiring data in the biomedical as well as other fields.