Thanks for writing — it was good to catch up with you and Ivo yesterday!
- Scale invariance: Szekely and Rizzo make a more general set of claims, that energy distances are affine invariant, rotation invariant, and scale equivariant. There is a review paper here: https://doi.org/10.1146/annurev-statistics-060116-054026 Section 3 has a scale equivariance proof using the Fourier representation of the energy distance.
- Integral probability metrics: I have been more informal than I would in a technical publication! The Kolmogorov metric is an integral probability metric which uses as its witness class the functions of bounded variation 1 [Müller, 1997, Theorem 5.2], which might indeed cause trouble for optimization. A more careful statement would be that in practical settings, we’d use an IPM with “a witness function class that gives non-trivial results on samples”, with the additional requirement of “smoothness” if we want to optimize or take gradients.
- What’s in a name: I think it’s reasonable to call your GAN a Generative Moment Matching Network (GMMN). It’s fair to the authors of previous work [3,4] to use the name they originally proposed for this class of algorithm (i.e., a GAN using an MMD as the critic). Of course, if you are generating samples in a single dimension on the real line, then “Cramer GAN” would be a correct name. In more than one dimension, such as when generating images, the Cramer and Energy distances are not the same, and you are using the latter, which is an MMD. Moreover, the squared MMD in general does not have biased gradients, so the advantage over Wasserstein is not confined just to the Energy distance or Cramer distance.
- Is the critic correct: even in the conditional case (your Figure 3), and even when the generator uses the correct MMD with the learned critic features, the approximation can unfortunately still cause problems, since the learned critic features might learn to hide information from the generator. Check your email for my suggestion of a construction.
- Open problem: let’s continue over email :)
[Müller, 1997] Integral probability metrics and their generating classes of functions.