Paper review— Diverse Image Generation via Self-Conditioned GANs

yw_nam
Analytics Vidhya
Published in
6 min readJul 25, 2020

Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)

Content

  1. Abstract
  2. Method
  3. Result and Experiments
  4. My Opinion

1.Abstract

Fig 1. Image Synthesis by Cluster
Fig 2. Visualization by 2 specific categories

This paper is accepted by CVPR 2020. The authors say that for Mode Collapse, there are cases where Mode is missed in the case of Unconditional GAN. In the case of Conditional GAN, it is possible to force the generator to have all modes by giving a label. However, in practice, it is often difficult to acquire data with a label.

Therefore, the authors introduce a Self-Conditional GAN model that trains without Class label to solve the Mode Collapse. The model in this paper automatically clusters the feature space in Discriminator D. Through these clusters, D makes generator to cover all semantic categories

The authors argue that they got a good result for mode collapse benchmark(mixtures of Gaussians, stacked MNIST, CIFAR-10)

The code is available here

2. Method

First, GAN must imitate P_real (Target distribution). they partition the data set into k clusters {π_1, . . . , π_k} that are determined during training. they are not using ground truth for this.

training sample are initially clustered in the randomly initialized discriminator feature space

Fig 3. D and G are trained on the condition of cluster c, which is found automatically.

D should be able to recognize the real image sampled from the cluster π_c in the dataset and be able to distinguish it from the fake image synthesized by the class-Conditional Generator G

Conversely, G generated under the condition of c must deceive D by imitating the image of π_c. At this time, instead of using the ground truth label, clustering random discriminator features and periodically recluster the discriminator features.

The authors say the two key points of their algorithm are:

1. Conditional GAN training with respect to cluster labels given by the current partitioning.

2. Updating the partition according to the current discriminator features of real data periodically.

Conditional GAN training

The model consists of class-conditional generator G(z, c) and class-conditional discriminator D(x, c). Note that, the internal discriminator feature layer is set to D_f, and the last layer is set to D_h.

Eq 1. I can’t find notation of operation in paper. maybe concatenation?

The adversarial objectives to be optimized are:

Eq 2. adversarial objective

In this case , The cluster index c is sampled from the categorical distribution P_π, which weights each cluster in proportion to the true size in the training set. Eq 2. is optimized by MinMax method.

Eq 3. MinMax Fashion Optimize

Overall training procedure is as following.

Computing new partition by clustering

During training, the shared discriminator layers D_f need to learn better about the distribution of data. So, the model periodically update π by re-partitioning the target dataset over a metric induced by the current discriminator features. they use k-means clustering [32] to obtain a new partition into k clusters according to the D_f output space.

Eq 4. a new partition into k clusters according to the D_f output space

D_f(x) is mean of each cluster in D_f feature space . μ_c is defined as follows.

Eq 5. Define μ_c

Clustering initialization.

When conducting the first clustering, the authors said that they used k-means++ initialization. In the case of subsequent re-clustering, it is initialized using mean obtained from the previous clustering. The formula for new clustering is:

Eq 6. To compute the clustering

At this time, D of Eq 6. represents the current discriminator feature space.

Matching with old clusters.

After repartitioning, to prevent conditional G and D from re-learning from scratch, The authors say that by matching the new and old clusters, so that the target distribution for each generator does not change drastically. they aim to find a permutation ρ: [k] → [k] that minimizes the objective:

Eq 7. The objective to minimize

The authors solve the above matching loss using the classic hungarian min-cost matching algorithm, and obtain the new clusters to be used for GAN training in future epochs. Algorithm 1. provides a summary of the entire training process.

3. Result and Experiments

Data.

The authors used Real dataset (Stacked-MNIST, CIFAR-10, etc.) and Synthesis dataset (The 2D-ring dataset, The 2D-grid dataset).

  • The 2D-ring dataset is a mixture of 8 2D-Gaussians with mean (cos 2πi/8 ,sin 2πi/8) and variance of 1e-4. At this time, i ∈ {0, . . . , 7}.
  • The 2D-grid dataset is a mixture of 25 2D-Gaussians with mean (2i − 4, 2j − 4) and variance of 0.0025. At this time, i, j ∈ {0, . . . , 4}.

Experiments.

\Table 1. Number of modes recovered, percent high quality samples, and reverse KL divergence metrics. the result are average over 10 trials
Figure 4. Visual comparison generated samples on the 2D-grid dataset.
Table 2. Number of modes recovered, reverse KL divergence, and Inception Score (IS) metrics. the result are average over 5 trials

FID in Table 2 is a popular indicator, but I am not sure what Table 1’s percentage high quality sample means. What indicators were used.. Anyway, we can see that the authors’ models obtained good results in mode collapse, reverse KL FID, and IS metrics in all four datasets.

Table 3. FID and IS for varying k. The result are average over 5 trials

It showed bad performance for very small K, but good results for a wide range of K.

Table 4. FID, FSD, IS for ImageNet, Place365

Overall, it shows good performance, but still fails to reach the quality of fully-supervised class conditional ones. and author said.

Although Logo-GAN-RC outperforms our method, it is based on a supervised pre-trained ImageNet classifier.

There are many other figures and tables, but I think this is enough to introduce this paper.

There are many other figures and tables, but I think this is enough to introduce this paper. Please refer to the paper for more figures and tables.

My opinions

The core idea of this paper is self-supervised label clustering.

It is really important problem to solve because there is a many case that labels and annotations are not included in the real data. But,Unfortunately, there is not much mention of the quality of the data because the subject of the paper is label clustering.

--

--

yw_nam
Analytics Vidhya

Master Student who learning deep learning architecture