First, I believe, gaussian distribution are pretty amazing and is the best what we have for most of the use cases. However, at the same time, “Gaussian” assumption that we often made is also being questioned in recent deep generative models. An example would be using non-parametric prior, like Beta-Bernoulli Process, for latent variable.

Although independence in the generative factors is the basic assumptions, this shouldn’t be confused with the “Gaussian” prior. Even without gaussian prior, we can and should be able to achieve disentanglement which, at least on recent literature, can be defined as “**capturing all the generative factors on different dimensions of the latent space**”. These are mostly demonstrated on the image dataset whose generative factors can be visualized. An example would be a chair which is rotating such that it’s shape, azimuth angle, lighting conditions, image intensity, etc. all being the generative factors should be able to captured by the latent units independently.