The #paperoftheweek is: “Generating Diverse High-Fidelity Images with VQ-VAE-2”

Teodora Suciu
Generate vision
Published in
2 min readJul 3, 2019

In our weekly paper, the authors propose a two-stage generator architecture which aims to exploit the strengths of two separate generative modeling techniques and additionally achieve impressive compression ratios.

In the first stage, a hierarchical autoencoder with multiple stacked vector quantized variational autoencoders (VQ-VAE) is trained on the dataset. The latent spaces on each level are compressed by quantization.

In a second stage for each level in the hierarchy, a PixelCNNs is trained. The PixelCNN only models the latents, allowing it to spend its capacity on the global structure and most perceivable features. In the lowest hierarchical level, the PixelCNN is conditioned on the class. In the higher levels, the PixelCNNs are conditioned on the previous quantized latent map.

The resulting generator does not suffer from mode collapse, which is a well-known problem for adversarial approaches. The generated samples show more diversity.
Additionally, the hierarchical VQ-VAE is able to compress images into a latent space which is about 50x smaller for ImageNet and 200x smaller for FFHQ Faces dataset.

Abstract:

“We explore the use of Vector Quantized Variational AutoEncoder (VQ-VAE) models for large scale image generation. To this end, we scale and enhance the autoregressive priors used in VQ-VAE to generate synthetic samples of much higher coherence and fidelity than possible before. We use simple feed-forward encoder and decoder networks, making our model an attractive candidate for applications where the encoding and/or decoding speed is critical. Additionally, VQ-VAE requires sampling an autoregressive model only in the compressed latent space, which is an order of magnitude faster than sampling in the pixel space, especially for large images. We demonstrate that a multi-scale hierarchical organization of VQ-VAE, augmented with powerful priors over the latent codes, is able to generate samples with quality that rivals that of state of the art Generative Adversarial Networks on multifaceted datasets such as ImageNet, while not suffering from GAN’s known shortcomings such as mode collapse and lack of diversity.”

You can read the full article here.

About the author:

Elias Vansteenkiste, Lead Researcher Scientist at Brighter AI.

About Brighter AI:

Brighter AI has developed an innovative privacy solution for visual data: Deep Natural Anonymization. The solution replaces personally identifiable information such as faces and licenses plates with artificial objects, thereby enabling all AI and analytics use cases, e.g. self-driving cars and smart retail. In 2018, NVIDIA named the German company “Europe’s Hottest AI Startup”.

--

--