Paper review-AsynDGAN:Train deep learning Without Sharing Medical Image Data
Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)
Content
- Abstract
- Method
- Result and Experiments
- My Opinion
1.Abstract
This paper is accepted by CVPR 2020. In general, since medical data is related to the patient’s privacy, it is often impossible to share with others. Therefore, because the published medical data is small, the authors argue that models that require a lot of data, such as deep learning, are difficult to apply to medical data.
The authors argue that learning the central generator G from a distributed discriminator, and using the image created from G, enables learning without data sharing between hospitals and solves privacy problems.
They also said, the proposed model can solve the following problems:
- Our experiments show that our approach could learn the real image’s distribution from multiple datasets without sharing the patient’s raw data
- Our experiments show that our approach more efficient and requires lower bandwidth than other distributed deep learning methods
- Our experiments show that our approach achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets
- Our experiments show that our approach has provable guarantees that the generator could learn the distributed distribution
The code is available here
2. Method
As shown in Fig. 2, Central generator G receives task specific input (segmentation in this paper). G creates a synthesis image to fool the local discriminator (D_1, D_2, …, D_n.). D_n needs to discriminate between synthesis data(x) and real data(G(x_n)). Between G and D, only Gradient and Synthesis images are transferred. Therefore, the authors argue that data privacy is not violated because only local medical entities access their own real data (G(x_n)).
Objective of AsynDGAN
In AsynDGAN, G is supervised by N different D. Each D is associated with a subset of the dataset. Therefore, s(x) can be expressed as follows.
Therefore, the Loss function can be written as follows.
Optimization process
In Fig 3, The solid arrows show the forward pass, and the dotted arrows show gradient flow during the backward pass of our iterative update procedure. The solid block indicate that it is being updated while the dotted blocks mean that they are frozen during that update step. Red and blue rectangles are source mask and target real image, respectively.
Model update follows the following process.
- D-update: Calculating the adversarial loss for j-th discriminator D_j and update D_j , where j = 1, 2, · · · , N.
- G-update: After updating all discriminators, G will be updated using the adversarial loss as follow.
This can be described as follow.
3. Result and Experiments
Data.
The authors used Synthetic dataset, BraTS2018, Multi-Organ.
Synthesis dataset is created by combining 3 one-dimensional Gaussian.
That is, y= ∑ (y_j + E_x=j). At this time, each y_j follows y_1~N (−3, 2), y_2 ~N (1, 1), and y_3~N (3, 0.5). j = {1, 2, 3}.
Experiment on synthetic dataset
setting
- Syn-All: Training a regular GAN using all samples in the dataset.
- Syn-Subset-n: Training a regular GAN using only samples in local subset n, where n ∈ {1, 2, 3}.
- AsynDGAN: Training AsynDGAN using samples in all subsets in a distributed fashion.
Result
In Fig. 4, assuming that a is the baseline, the result of c looks better than the result of b.
Experiment on Brain tumor segmentation, Nuclei segmentation
Setting
- Real-All: Training using real images from the whole train set
- Real-Subset-n: Real-Subset-n. Training using real images from the n-th subse, where n = 1, 2, · · · , 10. for Brain tumor segmentation and
n ∈ {breast, liver, kidney, prostate}. for Nuclei segmentation - Syn-All: Training using synthetic images generated from a regular GAN. The GAN is trained directly using all real images
- AsynDGAN: Training using synthetic images from proposed AsynDGAN
Result of Brain tumor segmentation
Result of Nuclei segmentation
Result
Of course, Real-All shows the best performance. However, there are privacy issues, so many datasets cannot be accessed. Therefore, in reality, the model shows the performance of Real-subset.
However, AsynDGAN solves the privacy issue and gets better than Real-Subset-n. Also, it shows results similar to Syn-All that synthesizes after learning using all real data.
My opinions
The idea of this paper give us important insight about applying deep learning to medical data.
However, to actually implement, the process of updating G from the Distributed Discriminator, that is
But, In practical, the gradient and synthesis image should be transferred to G from D by several medical entities. This implementation looks very challenging task.