Landscape Learning for Neural Network Inversion

ICCV 2023

Ruoshi Liu
4 min readJun 27, 2022

Paper, Code & Model, Talk

Ruoshi Liu, Chengzhi Mao, Purva Tendulkar, Hao Wang*, and Carl Vondrick

Columbia University, * Rutgers University

TL;DR: We learn an easy-to-optimize loss landscape for neural network inversion problems such as GAN inversion and 3D human reconstruction.

Many machine learning methods operate by inverting a neural network at inference time, which has become a popular technique for solving inverse problems in computer vision, robotics, and graphics. However, these methods often involve gradient descent through a highly non-convex loss landscape (as shown in Figure 1), causing the optimization process to be unstable and slow. We introduce a method that learns a loss landscape where gradient descent is efficient, bringing massive improvement and acceleration to the inversion process. We demonstrate this advantage on a number of methods for both generative and discriminative tasks, including GAN inversion, adversarial defense, and 3D human pose reconstruction.

Method

Figure 2: Method

Take GAN inversion as an example, in order to invert F, we need to optimize over the latent space X, in order to match the desired output image y. This optimization process is often unstable and slow, therefore, we propose to create a new space Z where gradient descent is easier. To parameterize Z, we will use a neural network θ : Z → X that maps from the new space Z to the original space X. The learning problem we need solve is to estimate the parameters of θ so that there is a short gradient descent path in Z from the initialization to the solution. Fig. 2 shows an overview of this setup.

Formally, we solve the overall objective of:

Intuition

Landscape learning sounds a bit crazy, isn’t it? Let’s talk about some intuitions behind this method.

Figure 3: Landscape Learning

An optimization trajectory (red arrows) collected is used to train θ. Points on the trajectory that correspond to a higher loss will yield a higher gradient (blue arrows) when training θ. Optimization over multiple steps along the trajectory causes θ to learn patterns of trajectories and create a smoother loss landscape.

Results

We have demonstrated that our method works on 3 different applications: GAN inversion, adversarial defense, and 3D human pose reconstruction. Here we will show some interesting qualitative results.

GAN Inversion

Figure 4: Optimization Process for GAN Inversion. Comparing optimization process of our method and the baseline in order to reconstruct the ground truth image. Left shows gradient descent in original landscape; middle shows gradient descent in our learned landscape; right shows ground truth image.

As you can see in the above animation (or not since Medium compressed my Gifs too much), gradient descent in our learned landscape is order of magnitudes faster than the original loss landscape. Our method can recover most facial details in 10 iterations that takes hundreds of iterations in the original landscape.

Human Action Reconstruction

Figure 5: Optimization Process for 3D Human Pose Reconstruction. Results shown are for out-of-distribution PROX dataset for sitting (Top) and standing (Bottom) poses.

Results show a similar trend as GAN inversion — our optimization process is massively faster.

Adversarial Defense

We also apply our method on the SOTA defense for adversarial attack, and here are the quantitative results evaluated on CIFAR-10 classification.

Table 1: Experiment on improving adversarial robust accuracy.

Analysis

You might be thinking: why would you want to perform inference by inverting a neural network in the first place? Carl Vondrick gave a great summary of the advantages of inference by neural network inversion in a recent CVPR talk. Alan Yuille also illustrated some fundamental motivations of analysis by synthesis.

Diversity

One of the biggest advantages is the ability to generate multiple hypothesis when the problem is under-constrained. Here are some examples of masked reconstruction obtained by inverting a generative neural network.

Figure 6: Diversity of Masked Reconstructions. We visualize reconstructions for partially observable inputs from random initialization. The masked regions are not considered for loss computation, i.e., the gradient is set to be zero. By optimizing only on the partial observation, we obtain diverse, feasible solutions for the hidden regions.

Generalization to Out-of-distribution Data

Compared to an encoder trained to perform the same task, optimization-based method does much better for out-of-distribution data, where an encoder often fails completely.

Figure 7: Diversity of Masked Reconstructions. We visualize reconstructions for partially observable inputs from random initialization. The masked regions are not considered for loss computation, i.e., the gradient is set to be zero. By optimizing only on the partial observation, we obtain diverse, feasible solutions for the hidden regions.

Loss Landscape

As we’ve alluded to previously, the acceleration in optimization is attributed to a smoother learned loss landscape. We visualize the loss landscapes with dimensionality reduction (details can be founded in section 4.4 of the paper).

Figure 8: Visualizing Loss Landscape (Uncurated). Visualizing the loss landscape of StyleGAN inversion spanned by two principle directions. Top row shows 4 examples of the loss landscapes corresponding to our space Z. Bottom row shows the loss landscapes corresponding to the original input space X for the same 4 examples.

Acknowledgements: This research is based on work partially supported by the NSF NRI Award #1925157, NSF STC LEAP, the DARPA MCS program, and the DARPA CCU program. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the sponsors.

--

--