Very nice explanation.

How about properly defining ‘disentanglement’? What is the perfect disentanglement? Is it dependent on data and use cases?

Let me explain my thought process to share where this question is coming from.

Given your data distribution x, you want to know its source of generation i.e. its latent representation. For this, you would need to compute p(z|x) -which is like the holy grail of ML. But since it is practically impossible to compute it hence the need for approximation q(z|x). e.g. VAE uses variational inference for this approximation where q(z|x) is Gaussian. Since we don’t know the real distribution p(z|x), isn’t measuring the goodness of q(z|x) based on our perspective guided by use case? that means the evaluation of q(z|x) is subjective?