A Paper in 5 mins — FactorVAE

Nicola Bernini
Discussing Deep Learning
4 min readMay 29, 2020

The goal of this article is to explain the core of this paper in just 5 minutes of your time.

1. Introduction

We are going to explore the following paper and understand its core contribution

Learning a disentangled representation is more or less like learning a “code” where each digit represents a specific “factor of variance” of the dataset : for example, if we were talking about faces then one digit could control the color of the skin, another the color of the hair and another the type of hair and so on.
If this representation is attached to a generative process, then these “digits” not only represent some semantic of the input but they can also control it in the generative process itself.

Fig.1 from Beta VAE Paper

Generative Processes are well suited for representation learning as it is possible to measure, with appropriate metrics, both the quality of the reconstruction (reconstruction loss) and in the case of disentangled representation learning also the quality of the disentanglement.

VAE is a type of generative model relying on a probabilistic representation of the latent code and they are a commonly used tools to try learning such a disentangled representation.

VAE Scheme from Tutorial on Variational Autoencoders, on the left implemented in its standard form and on the right implemented with the reparametrization trick

2. The Paper Core Idea

This paper is essentially an improvement of Beta VAE which addressed this kind of learning focusing on the Objective Function of the standard VAE which consisted of 2 factors

  • reconstruction loss
  • a KL divergence termbetween the learned representation and a prior which is disentangled so, in probabilistic terms, its Joint Distribution is factorized

The Beta VAE paper improves the standard VAE objective function by adding the beta factor which is aimed at fine tuning the importance of the KL divergence term, hence the force towards the prior.

Beta VAE Objective Function (from the paper)

This is empirically observed to come at the cost of degraded reconstruction performance so more disentanglement means lower performance.

FactorVAE key insight consists of understanding the loss function at a deeper level and observing the second term is a sum of

  • mutual information between the input and the code
  • the actual divergence with respect to the prior
Mutual Information + KL Divergence (from the paper)

Penalizing the KL divergence, as it happens in Beta VAE, means penalizing both the mutual information and the KL divergence and thishas both a positive and a negative effect

  • the positive effect is related to penalizing the divergence with respect to the prior, as we want our representation to be as disentangled as possible
  • the negative effect is related to penalizing mutual information, as we want our code to preserve as much information about the input as possible, otherwise we will observe degraded reconstruction performance which is what is actually observed

So FactorVAE essentially proposes a new loss fixing this issue

FactorVAE Objective Function (from the paper)

3. Experiments

In the second part of the paper they make experiments and comparisons using a new metric

New Metric
Comparison: almost same reconstruction error with much better disentanglement in the representation according to the new metric (from the paper)

Please do not forget to clap if you liked this

--

--

Nicola Bernini
Discussing Deep Learning

Machine Learning PhD, Physicist. Mainly interested in Deep Learning, Functional Programming. https://github.com/NicolaBernini