Topology of a latent space: problems with intrinsic dimension mismatch between data and latent space.

Written by Piotr Tempczyk

Piotr Tempczyk

Published in

Acta Schola Automata Polonica

7 min readJan 13, 2022

The seven bridges of Königsberg Euler’s puzzle (image from https://www.e-education.psu.edu/geog160/node/1949)

Introduction

In my last blog post I discussed some potential problems with latent space Topology and its connection to performance of deep generative models like VAE, Normalizing flows and GANs. This blog post finished with a list of potential problems, and in this part I am going to show you some examples of those issues on simple low dimensional synthetic datasets using β-VAE (from [1], we will from now call it VAE for short). If this is your first blog post in this series please read the former blog post to get the idea what we are going to do here.

Model and dataset visualization

1D/2D latent space visualization

In Figure 1 you can see an example of an experiment visualization. This may seem complicated at the first glance, but I am going to explain every aspect of this image in this chapter, because it is crucial thing to understand all of my experiments.

Figure 1: A 2D Gaussian distribution sample encoded using VAE with 1D latent space.

In the upper row there are plots for the data space, and on the lower row there are analogous plots for the latent space representation. When the space (data or latent) is 1 dimensional, like in the second row in Figure 1, the visualizations are different than in 2 dimensional case (the first row in Figure 1). I am going to go through them now one by one. If the latent space has more than 2 dimensions, the plots are empty like in Figure 3 and there is often an additional plot with latent space representation for 3 and more dimensions.

In the first column upper row there is a histogram of the original dataset. In 1D case this is a classical histogram, and in 2D case it is a heat map of the 2D histogram counts. Sometimes there is log transformation on counts in the 2D histogram plot, and then there is white background. In the 2nd row of this 1st column there is a histogram of the sample from the original latent space distribution visualized in the same way for 1D/2D case. In case of plain VAE this is a isotropic Gaussian distribution.

2nd column corresponds to the encoding and decoding of the dataset. The upper row shows the histogram of the reconstructed dataset (encoded in the latent space and then decoded from the latent space using mean of the latent representation distribution for each point). The lower row shows distribution of the dataset encoded in the latent space. In 2D case it is only the histogram of the distribution, and in the 1D case it is plotted along with histogram of the original latent space distribution.

3rd column corresponds to the sampling from our generative model. In the upper row there is a histogram of the sample from the model (points sampled from the original latent space distribution and then decoded using decoder network). In the lower row there is a distribution of points which were sampled from the latent space distribution, decoded into data space and then encoded again in the latent space. In the 1D case they are shown with the original latent space distribution.

4th column shows how the points from data and latent space change when transformed trough the decoder and the encoder of the model. In the upper row there is a scatter plot of the sample of original data points along with their reconstruction. In the lower row, we sample points from the latent space and then pass them trough decoder and encoder to obtain their reconstruction and then we visualize them the same way. In 2D case those points are plotted on a 2D plane and connected with a red line and in 1D case they are plotted above one another and connected with a line to better visualize how they move after being transformed by the model. In 1D case all the lines being vertical means that the reconstruction is perfect.

In the last, 5th column, we visualize the distances between points and their reconstructions from the 4th row in the form of a histogram. In the upper row is the reconstruction of the dataset, and in the lower row we visualize the latent space reconstruction for the samples from the model.

3D+ latent spaces

When the latent space size is bigger than 2, we cannot represent effectively distributions like we did in the 1D/2D case. In this case, we are going to represent it using scatter plot matrix of the 2D slices. Example of this plot is shown in Figure 4.

On the diagonal of this plot matrix there are 1D histograms of each coordinate in the latent space. Above the diagonal there are 2D reconstruction fields (similar to those from the lower row of the 4th column of experiment visualizations) for the corresponding coordinate pairs. Below the diagonal there are plots of the encoded samples from the dataset along with the original latent space distribution.

Blue dots correspond to original samples from the latent space distribution, yellow dots correspond to the reconstruction of those samples (connected with the red line) and red dots corresponds to dataset samples encoded into the latent space.

VAE on a 2D Gaussian distribution

First thing I explored was how VAE behaves on a simple 2D dataset generated by transforming points from an isotropic 2D normal distribution with 0 mean and a unit variance, when you vary a latent space dimension. When encoding the data into latent space I always use a mean of the distribution as a representation of the data point. To do this I trained 5 different VAE models with latent space size varying from 1 to 5.

In the 1D latent space case shown in the Figure 1, we can see that the latent space size is too small to encode the dataset effectively, and the model tries to fill the 2D data space with the 1D line the best way it can. It causes three problems: the encoded data distribution does not follow the latent space distribution, the reconstruction error for the decoded data points is big and the samples from the latent space does not cover the whole data distribution.

Figure 2: A 2D Gaussian distribution sample encoded using VAE with 2D latent space.

When we change the latent space size to 2, we resolve all of those problems, as we can see on Figure 2. The reconstructions in data and latent space are almost perfect and the distribution of the data and the samples from model are very similar. In this case, when the latent space size matches the dimensionality of the data manifold our model defines almost a perfect bijection between data and latent space, which makes all of our our problems disappear.

Figure 3: A 2D Gaussian distribution sample encoded using VAE with 3D latent space.

Figure 4: Latent space visualization for a 2D Gaussian distribution sample encoded using VAE with 3D latent space.

In the 3D latent space case shown in Figures 3 and 4, we can observe that one of the dimensions in the latent space collapsed. It means that for every point from the dataset encoder predicts for collapsed coordinate Gaussian distribution with 0 mean and unit variance. What is interesting, the samples from the model and data reconstruction are not hurt in this case and only the reconstruction in the latent size samples is affected by this effect. But this does not change our sample quality and reconstruction quality, so from the practical point of view the model is only oversized and we can get the same results with a smaller model (with fewer parameters).

For the 4D and 5D case the results are the same and we get more collapsed dimension but the reconstruction and the samples remain unaffected. We plot a median of the reconstruction error for the data and the latent space (constructed similar way as our histograms from 5th column) for each of the model in Figure 5.

Figure 5: Reconstruction error median for the data and the latent space for different latent space sizes (x axis) in case of 2D Gaussian sample dataset.

We can see, that there is a minimum of each of the curves at latent size equal 2, which is at the same time the dimensionality of the data space (and data manifold dimensionality at the same time). In next blog post we are going to explore this topic further for more twisted manifolds with topology different than the latent space itself (even if they have the same topological dimension).

If you enjoyed this post, please hit the clap button below and follow our publication for more interesting articles about ML & AI.

References

[1] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot,
Matthew Botvinick, Shakir Mohamed, and Alexander Lerchner. beta-vae:
Learning basic visual concepts with a constrained variational framework.
Iclr, 2(5):6, 2017.