“Best GAN samples ever yet? Very impressive ICLR submission! BigGAN improves Inception Scores by >100.”
The above Tweet is from renowned Google DeepMind research scientist Oriol Vinyals. It was retweeted last week by Google Brain researcher and “Father of Generative Adversarial Networks” Ian Goodfellow, and picked up momentum and praise from AI researchers on social media.
All the attention surrounds the paper Large Scale GAN Training for High Fidelity Natural Image Synthesis, which recently popped up on the social network. The paper is an internship project by Andrew Brock from Heriot-Watt University in collaboration with Jeff Donahue and Karen Simonyan from DeepMind. It is under review for next spring’s ICLR 2019.
Figure 1 shows how the model is capable of generating very impressive images with high fidelity and low variety gap. When trained on the ImageNet dataset at 128×128 resolution, BigGAN can achieve an Inception Score (IS) of 166.3, a more than 100 percent improvement over the previous state of the art (SotA) result of 52.52. The Frechet Inception Distance (FID) score has also been improved from 18.65 to 9.6.
The authors proposed a model (BigGAN) with modifications focused on the following three aspects:
- Scalability: As the authors discovered that GANs benefit dramatically from scaling, they introduced two architectural changes to improve scalability (described in detail in the paper’s Appendix B), while at the same time improving conditioning by applying orthogonal regularization to the generator.
- Robustness: The orthogonal regularization applied to the generator makes the model amenable to the “truncation trick” so that fine control of the trade-offs between fidelity and variety is possible by truncating the latent space.
- Stability: The authors discovered and characterized instabilities specific to large-scale GANs, and devised solutions to minimize the instabilities — although these involved a relatively high trade-off on performance.