The #paperoftheweek 14 is: “HoloGAN: Unsupervised learning of 3D representations from natural images”

This week we selected yet another new impressive GAN paper that starts from the premise that each image is taken of a 3D scene with a camera in a certain position. This prior knowledge is encoded in the architecture of the proposed generator. The generator network starts from a constant very similar to the style based generator introduced at the end of 2018, but the difference is that the constant a 4D tensor is instead of a 3D tensor. The first layers of the generator network consist of 3D convolutions and thus operating on 3D features instead of 2D features.

In a second stage, a random pose is sampled from a uniform distribution. The 3D features are transformed to match the pose that was sampled. After a second subnetwork with 3D convolutions, a projection unit transforms the 3D feature tensors to 2D feature tensors.

The last stage of the generator network consists of 2D convolutions.
The first and the last stage are controlled by learning styles and adaptive instance normalization in a similar way to the style based generator.

Next, to the improvement on quality for the LSUN chairs and bedroom subsets, an important advantage of a hologram is the disentanglement of the camera position. It allows to control the angles while generating images.

Abstract

“We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3D representation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.”

You can read the full article here.

About the author:

Elias Vansteenkiste, Lead Researcher Scientist at Brighter AI.