Paper review-AsynDGAN:Train deep learning Without Sharing Medical Image Data

yw_nam
Analytics Vidhya
Published in
5 min readJul 28, 2020

Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)

Content

  1. Abstract
  2. Method
  3. Result and Experiments
  4. My Opinion

1.Abstract

Fig 1. Compared Synthesis c brain tumor image by AsynDGAN and Real image
Fig 2. Compared Synthesis nuclei images by AsynDGAN and Real image

This paper is accepted by CVPR 2020. In general, since medical data is related to the patient’s privacy, it is often impossible to share with others. Therefore, because the published medical data is small, the authors argue that models that require a lot of data, such as deep learning, are difficult to apply to medical data.

The authors argue that learning the central generator G from a distributed discriminator, and using the image created from G, enables learning without data sharing between hospitals and solves privacy problems.

They also said, the proposed model can solve the following problems:

  • Our experiments show that our approach could learn the real image’s distribution from multiple datasets without sharing the patient’s raw data
  • Our experiments show that our approach more efficient and requires lower bandwidth than other distributed deep learning methods
  • Our experiments show that our approach achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets
  • Our experiments show that our approach has provable guarantees that the generator could learn the distributed distribution

The code is available here

2. Method

Fig 3. Entire model structure.

As shown in Fig. 2, Central generator G receives task specific input (segmentation in this paper). G creates a synthesis image to fool the local discriminator (D_1, D_2, …, D_n.). D_n needs to discriminate between synthesis data(x) and real data(G(x_n)). Between G and D, only Gradient and Synthesis images are transferred. Therefore, the authors argue that data privacy is not violated because only local medical entities access their own real data (G(x_n)).

Objective of AsynDGAN

Eq 1. The objective of a classical conditional GAN

In AsynDGAN, G is supervised by N different D. Each D is associated with a subset of the dataset. Therefore, s(x) can be expressed as follows.

Eq 2. mixture distribution on auxiliary variable x

Therefore, the Loss function can be written as follows.

Eq 3. Full Loss function

Optimization process

Fig 4. The optimization process of AsynDGAN.

In Fig 3, The solid arrows show the forward pass, and the dotted arrows show gradient flow during the backward pass of our iterative update procedure. The solid block indicate that it is being updated while the dotted blocks mean that they are frozen during that update step. Red and blue rectangles are source mask and target real image, respectively.

Model update follows the following process.

  1. D-update: Calculating the adversarial loss for j-th discriminator D_j and update D_j , where j = 1, 2, · · · , N.
  2. G-update: After updating all discriminators, G will be updated using the adversarial loss as follow.
Eq 4. adversarial loss

This can be described as follow.

3. Result and Experiments

Data.

The authors used Synthetic dataset, BraTS2018, Multi-Organ.
Synthesis dataset is created by combining 3 one-dimensional Gaussian.
That is, y= ∑ (y_j + E_x=j). At this time, each y_j follows y_1~N (−3, 2), y_2 ~N (1, 1), and y_3~N (3, 0.5). j = {1, 2, 3}.

Experiment on synthetic dataset

setting

  • Syn-All: Training a regular GAN using all samples in the dataset.
  • Syn-Subset-n: Training a regular GAN using only samples in local subset n, where n ∈ {1, 2, 3}.
  • AsynDGAN: Training AsynDGAN using samples in all subsets in a distributed fashion.

Result

Fig 5.Generated distributions of different methods

In Fig. 4, assuming that a is the baseline, the result of c looks better than the result of b.

Experiment on Brain tumor segmentation, Nuclei segmentation

Setting

  • Real-All: Training using real images from the whole train set
  • Real-Subset-n: Real-Subset-n. Training using real images from the n-th subse, where n = 1, 2, · · · , 10. for Brain tumor segmentation and
    n ∈ {breast, liver, kidney, prostate}. for Nuclei segmentation
  • Syn-All: Training using synthetic images generated from a regular GAN. The GAN is trained directly using all real images
  • AsynDGAN: Training using synthetic images from proposed AsynDGAN

Result of Brain tumor segmentation

Fig 6. Typical brain tumor segmentation results.
Table 1. Brain tumor segmentation results.

Result of Nuclei segmentation

Fig 7. Typical nuclei segmentation results
Table 2. Nuclei segmentation results

Result

Of course, Real-All shows the best performance. However, there are privacy issues, so many datasets cannot be accessed. Therefore, in reality, the model shows the performance of Real-subset.

However, AsynDGAN solves the privacy issue and gets better than Real-Subset-n. Also, it shows results similar to Syn-All that synthesizes after learning using all real data.

My opinions

The idea of this paper give us important insight about applying deep learning to medical data.

However, to actually implement, the process of updating G from the Distributed Discriminator, that is

Algorithm 2. Update G from D

But, In practical, the gradient and synthesis image should be transferred to G from D by several medical entities. This implementation looks very challenging task.

--

--

yw_nam
Analytics Vidhya

Master Student who learning deep learning architecture