Halil Agin
5 min readJun 28, 2020

GAN weekly update, week #27

Hello GAN enthusiasts!

This week, I will summarize four papers that got my attention. Here they are.

  1. Smooth Adversarial Training
  2. GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models
  3. High-Fidelity Generative Image Compression
  4. AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks

Smooth Adversarial Training (SAT) (arxiv.org link)

SAT attacks to the problem know as non-differentiability of ReLU activation functions. The paper explains the problem by providing the figure below. In this figure, the left image represents two functions 1) ReLU (red), and 2) Parametric SoftPlus(blue). In the right figure, the differential function of the former is not continuous, but the latter is continuous. We know that differentiable functions are very helpful for backpropagations. According to the differentiability, the latter one is more advantageous to train a Generative Adversarial Networks (GAN).

Having this information about the ReLU function, the authors argue that the ReLU function weakens adversarial training because of its nature, namely the non-smoothness of the derivative of the ReLU function. With the newly proposed activation function (parametric softplus) the authors achieved adversarial robustness for “free”.

In the paper, the authors provide the new activation function named “parametric softplus” below

And its derivative is given below, which is a continuous function of e^x.

After some experiments, the authors provided the figure below that shows that the proposed activation function improves the robustness significantly while keeping the accuracy as the same as ReLU.

GAT-GMM: Generative Adversarial Training for Gaussian Mixture Models (arxiv.org link)

First of all, this paper is excellent! The authors propose a new framework to solve the GAN’s failure on multi-modal datasets. They provide a new minimax GAN framework named “Generative Adversarial Training for Gaussian Mixture Models” (GAT-GMM) that learns multi-modal distributions such as Gaussian Mixture Models (GMM), which are well-known in clustering. The paper first attacks the problem in multi-modal datasets and provides a theoretical solution for that. Being motivated by transport theory, the authors suggest using a framework consisting of a random linear generator and a softmax-based quadratic discriminator, which is trained by Gradient Descent Ascent (GDA). The experiments show that the new framework approximates the parameters of GMM, where the parameters of GMM normally approximated by the Expectation-Maximization algorithm that is well-known for decades.

The figure below shows how successful the GAT-GMM regarding approximating to the true parameters of GMM. Although the authors uploaded an incomplete version of the paper, the current version is sufficient for a researcher to understand how the newly proposed framework solves the multi-modal problem in GAN.

High-Fidelity Generative Image Compression (arxiv.org link)

This paper provides a new compression scheme, namely HiFiC, for images based on a newly proposed GAN architecture, where the scheme results in image reconstructions with high perceptual fidelity that is supported by quantitative results and a user study. The authors show how their method produces realistic images while requiring fewer bitrates (bit per pixel, bpp). Moreover, in the user study provided, the participants choose the images generated by HiFiC although other images represented with doubled bitrates. The figure below shows how the HiFiC produces high-fidelity reconstruction image compared to its original and other methods.

I highly suggest the reader to see neural architecture proposed and its loss function.

AutoGAN-Distiller: Searching to Compress Generative Adversarial Networks (arxiv.org link)

This paper proposes a new end-to-end discovery method (AutoGAN-distiller, AGD) for the Generator G of GAN, where the proposed method produces a computationally light version of the original generator G. This new method will make the computationally fewer devices more GAN-friendly, such as mobile phones and tablets.

According to the paper, this area is not well-researched and deserves more researches. The newly proposed method is fully automatic and can be applied to any GAN where this was not possible in previous studies, for example, the study of Shu et al.[1]. The new method is based on Neural Architecture Search (NAS), which seeks an optimal neural network architecture from data, instead of using hand-crafting. A NAS framework consists of two key components: the search space and the proxy task and the AGD customizes both components to achieve a light version of the Generator of the target GAN.

The authors of the paper tested their framework on two types of tasks 1) Image translation, and 2) Super-resolution. According to the experiment results, the AGD consistently achieves significantly better FID than the method CEC, which is the SOTA at the moment. The table below shows the comparisons.

Lastly, the figure below shows the comparisons between AGD and the other methods mentioned in the paper. According to the figure below, the AGD outperforms the other methods regarding its usage of computation and memory.

Halil Agin

I am a machine learning scientist. I am very curious about the development in machine learning especially in bayesian statistics and deep learning.