Paper review— Diverse Image Generation via Self-Conditioned GANs
Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)
Content
- Abstract
- Method
- Result and Experiments
- My Opinion
1.Abstract
This paper is accepted by CVPR 2020. The authors say that for Mode Collapse, there are cases where Mode is missed in the case of Unconditional GAN. In the case of Conditional GAN, it is possible to force the generator to have all modes by giving a label. However, in practice, it is often difficult to acquire data with a label.
Therefore, the authors introduce a Self-Conditional GAN model that trains without Class label to solve the Mode Collapse. The model in this paper automatically clusters the feature space in Discriminator D. Through these clusters, D makes generator to cover all semantic categories
The authors argue that they got a good result for mode collapse benchmark(mixtures of Gaussians, stacked MNIST, CIFAR-10)
The code is available here
2. Method
First, GAN must imitate P_real (Target distribution). they partition the data set into k clusters {π_1, . . . , π_k} that are determined during training. they are not using ground truth for this.
training sample are initially clustered in the randomly initialized discriminator feature space
D should be able to recognize the real image sampled from the cluster π_c in the dataset and be able to distinguish it from the fake image synthesized by the class-Conditional Generator G
Conversely, G generated under the condition of c must deceive D by imitating the image of π_c. At this time, instead of using the ground truth label, clustering random discriminator features and periodically recluster the discriminator features.
The authors say the two key points of their algorithm are:
1. Conditional GAN training with respect to cluster labels given by the current partitioning.
2. Updating the partition according to the current discriminator features of real data periodically.
Conditional GAN training
The model consists of class-conditional generator G(z, c) and class-conditional discriminator D(x, c). Note that, the internal discriminator feature layer is set to D_f, and the last layer is set to D_h.
The adversarial objectives to be optimized are:
In this case , The cluster index c is sampled from the categorical distribution P_π, which weights each cluster in proportion to the true size in the training set. Eq 2. is optimized by MinMax method.
Overall training procedure is as following.
Computing new partition by clustering
During training, the shared discriminator layers D_f need to learn better about the distribution of data. So, the model periodically update π by re-partitioning the target dataset over a metric induced by the current discriminator features. they use k-means clustering [32] to obtain a new partition into k clusters according to the D_f output space.
D_f(x) is mean of each cluster in D_f feature space . μ_c is defined as follows.
Clustering initialization.
When conducting the first clustering, the authors said that they used k-means++ initialization. In the case of subsequent re-clustering, it is initialized using mean obtained from the previous clustering. The formula for new clustering is:
At this time, D of Eq 6. represents the current discriminator feature space.
Matching with old clusters.
After repartitioning, to prevent conditional G and D from re-learning from scratch, The authors say that by matching the new and old clusters, so that the target distribution for each generator does not change drastically. they aim to find a permutation ρ: [k] → [k] that minimizes the objective:
The authors solve the above matching loss using the classic hungarian min-cost matching algorithm, and obtain the new clusters to be used for GAN training in future epochs. Algorithm 1. provides a summary of the entire training process.
3. Result and Experiments
Data.
The authors used Real dataset (Stacked-MNIST, CIFAR-10, etc.) and Synthesis dataset (The 2D-ring dataset, The 2D-grid dataset).
- The 2D-ring dataset is a mixture of 8 2D-Gaussians with mean (cos 2πi/8 ,sin 2πi/8) and variance of 1e-4. At this time, i ∈ {0, . . . , 7}.
- The 2D-grid dataset is a mixture of 25 2D-Gaussians with mean (2i − 4, 2j − 4) and variance of 0.0025. At this time, i, j ∈ {0, . . . , 4}.
Experiments.
FID in Table 2 is a popular indicator, but I am not sure what Table 1’s percentage high quality sample means. What indicators were used.. Anyway, we can see that the authors’ models obtained good results in mode collapse, reverse KL FID, and IS metrics in all four datasets.
It showed bad performance for very small K, but good results for a wide range of K.
Overall, it shows good performance, but still fails to reach the quality of fully-supervised class conditional ones. and author said.
Although Logo-GAN-RC outperforms our method, it is based on a supervised pre-trained ImageNet classifier.
There are many other figures and tables, but I think this is enough to introduce this paper.
There are many other figures and tables, but I think this is enough to introduce this paper. Please refer to the paper for more figures and tables.
My opinions
The core idea of this paper is self-supervised label clustering.
It is really important problem to solve because there is a many case that labels and annotations are not included in the real data. But,Unfortunately, there is not much mention of the quality of the data because the subject of the paper is label clustering.