Landscape Learning for Neural Network Inversion
Columbia University, * Rutgers University
TL;DR: We learn an easy-to-optimize loss landscape for neural network inversion problems such as GAN inversion and 3D human reconstruction.
Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics. However, these methods often involve gradient descent through a highly non-convex loss landscape (as shown in Figure 1), causing the optimization process to be unstable and slow. We introduce a method that learns a loss landscape where gradient descent is efficient, bringing massive improvement and acceleration to the inversion process. We demonstrate this advantage on a number of methods for both generative and discriminative tasks, including GAN inversion, adversarial defense, and 3D human pose reconstruction.
Take GAN inversion as an example, in order to invert F, we need to optimize over the latent space X, in order to match the desired output image y. This optimization process is often unstable and slow, therefore, we propose to create a new space Z where gradient descent is easier. To parameterize Z, we will use a neural network θ : Z → X that maps from the new space Z to the original space X. The learning problem we need solve is to estimate the parameters of θ so that there is a short gradient descent path in Z from the initialization to the solution. Fig. 2 shows an overview of this setup.
Formally, we solve the overall objective of:
Landscape learning sounds a bit crazy, isn’t it? Let’s talk about some intuitions behind this method.
An optimization trajectory (red arrows) collected is used to train θ. Points on the trajectory that correspond to a higher loss will yield a higher gradient (blue arrows) when training θ. Optimization over multiple steps along the trajectory causes θ to learn patterns of trajectories and create a smoother loss landscape.
We have demonstrated that our method works on 3 different applications: GAN inversion, adversarial defense, and 3D human pose reconstruction. Here we will show some interesting qualitative results.
As you can see in the above animation (or not since Medium compressed my Gifs too much), gradient descent in our learned landscape is order of magnitudes faster than the original loss landscape. Our method can recover most facial details in 10 iterations that takes hundreds of iterations in the original landscape.
Human Action Reconstruction
Results show a similar trend as GAN inversion — our optimization process is massively faster.
We also apply our method on the SOTA defense for adversarial attack, and here are the quantitative results evaluated on CIFAR-10 classification.
You might be thinking: why would you want to perform inference by inverting a neural network in the first place? Carl Vondrick gave a great summary of the advantages of inference by neural network inversion in a recent CVPR talk. Alan Yuille also illustrated some fundamental motivations of analysis by synthesis.
One of the biggest advantages is the ability to generate multiple hypothesis when the problem is under-constrained. Here are some examples of masked reconstruction obtained by inverting a generative neural network.
Generalization to Out-of-distribution Data
Compared to an encoder trained to perform the same task, optimization-based method does much better for out-of-distribution data, where an encoder often fails completely.
As we’ve alluded to previously, the acceleration in optimization is attributed to a smoother learned loss landscape. We visualize the loss landscapes with dimensionality reduction (details can be founded in section 4.4 of the paper).
Acknowledgements: This research is based on work partially supported by the NSF NRI Award #1925157, NSF STC LEAP, the DARPA MCS program, and the DARPA CCU program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.