Paper Review- DeshuffleGAN: A Self-Supervised GAN to Improve Structure Learning(ICIP 2020)

Content

yw_nam
Analytics Vidhya
5 min readJul 13, 2020

--

  1. Abstract
  2. Method
  3. result and Experiments

1.Abstract

This paper is accepted by ICIP 2020.

The writer argue that one of the crucial points to improve the GAN performance in terms of realism and similarity to the original data distribution is to be able to provide the model with a capability to learn the spatial structure in data.

The idea that solve jigsaw puzzle for learning spatial representation is already exist.([1],[2]) But, this paper propose to enhance GAN’s learning capability using above idea.

The writer implement this model as pytorch. But as far as I know, it hasn’t been implemented yet.

2.Method

Fig 1. Structure of model

The author introduces Shuffler, which blends input images like a jigsaw puzzle to improve spatial expression learning.

Shuffler divides the input image into 9 tiles with H and W of input size /3, then shuffle image tiles in random order. At this time, the number of possible permutations is 9!, and 30 permutations are selected according to the Hamming distance. The same procedure is applied to all samples.

In DeshuffleGAN, discriminator D not only distinguishes standard GAN from X_real or X_fake, but also predicts the permutation order of S_fake and S_real. D shares weights when performing two tasks, except for the output layer.

So, deshuffling task is that solve jigsaw puzzles of S_fake and S_real using D. If the quality of the image generated by Generator G is not good, the puzzle tiles will not be related to each other. In this case, D gives negative feedback to G to improve the generation image quality

After take a look[1], [2]

the author says, Choose permutation order according to hamming distance. It looks ambiguous to me. So, i take a look [1], [2]. there is a table and algorithm that explain choosing permutation order.

Table 1. Ablation study on the impact of the permutation set(from [1])
from[1]

Still, i don’t know why they select 30 permutations. Is there any reason?..

After the comment from author.

The answer of the question above is that their object is not solving challenging deshuffling problem, they just want choose small but, efficient number of permutation.

2–1 Adversarial Loss

Fig 2 Classical GAN training Losses

Note that, P is a real data distribution, Q is a generated data distribution, C(x) is a measure of the realness of x, and L_D and L_G mean loss functions of D and G.

The author referred RaGAN’s theory as following.

classical GAN training leads to a problem in training because G pushes D to output 1 for both the real and fake data whereas in fact the discriminator should converge to 0.5 to realize JS-divergence between input and generated data distributions

also the author says,

the aim of the training should be not only to increase the probability that the fake data is real, but also to decrease the probability that the real data is real

Therefore, RaGAN propose new objective which, relativism such that the discriminator will estimate the probability of input data being more realistic than the generated data.

In this paper, it is said that RaGAN Loss and DCGAN structures are used. Authors say that they added only one conv layer to the output of D for permutation.

Fig 3. Losses for r/f

D predicts the probability of r/f both for X_real and X_fake since the shuffled data don’t affect the adversarial objective

2–2. Deshuffling Loss

For Discriminator, The objective is to minimize the error between the true shuffling order and the prediction for shuffling order of S_real. The author argue that the reason for updating D only according to S_real is because using S_fake has the potential to learn meaningless data.

Conversely, For Generator, the objective is to minimize the error between the true shuffling order and the prediction for shuffling order of S_fake. If G generate samples well, Well trained D using real data will able to deshuffle the generated sample. In this case, D give positive feedback to G.
In contrast, if G generate samples badly, Well trained D using real data will not able to deshuffle the generated sample. In this case, D give negative feedback to G.

The deshuffling objectives of D and G are given as cross entropy loss.

Fig 4. Cross Entropy loss of D and G

where N denotes number of samples, y_d is the one-hot encoded label vector of size 30×1 for S_real, y_bar_d is s the prediction vector of the permutation index for S_real.

the reason for having a 30×1 one-hot-encoded vector seems to be that 30 permutations were selected by the Hamming distance above.

2–3. Full Objective

Fig 5. Full Objective function

In this paper, α is selected as 1 and β is selected as 0.2

3. Result and Experiments

3–1. Object Functions

RaSGAN with the standard adversarial training loss in [3], RaLSGAN with the least squares loss in [4], and RaHingeGAN with the hinge loss[5] are used as baseline methods. DeshuffleGAN versions of the baselines add the deshuffling losses as in Fig 5.

3–2. Result

Table 2. Evaluation Result(FID)

DeshuffleGANs achieve lower FIDs with respect to the baselines in all of the settings except RaSGAN on CAT dataset.

Fig 6: Generation result for CAT dataset.
Fig 7. Generation result for LSUN Church dataset.

5 different vectors are sampled from the normal distribution, and are given as input to 6 different GAN models as following.
(a) RaSGAN, (b) RaLSGAN, (c) RaHingeGAN, (d) Deshuffle(RaS)GAN, (e) Deshuffle(RaLS)GAN, (f) Deshuffle(RaHinge)GAN

My opinions

In my personal opinion, I think it was very good to argue that the performance was improved by adding one Shuffle Loss term without changing significantly from the basic GAN framework.

--

--

yw_nam
Analytics Vidhya

Master Student who learning deep learning architecture