Review: Spectral normalization for GANs

Nilesh Barla
PerceptronAI
Published in
3 min readNov 24, 2021
Author

Generative adversarial networks or GANs are one of the most popular methods to generate samples. They are implicit models which means that they don’t learn the distribution of the data like variation autoencoders do, but they rather learn the statistical property by a min-max approach.

GANs use two networks: a generator and a discriminator, which essentially try to fool each other by generating fake samples and by discriminating between fake and real. In the process, the generator learns the original distribution by optimizing its parameters with respect to the discriminator. Since the discriminator already knows the underlying distribution of the original data, the generator leverages that knowledge in order to optimize its parameter and then fool the discriminator by generating fakes samples with a similar distribution.

Mathematically we can define the whole process as:

G(z) = ~x`

D(x) = 1/0

  1. The discriminator is trained on both the real and fake data. The fake data is usually the noise. D(x) = 1 for real; D(G(z)) = 0 for fake.
  2. The generator is then optimized with respect to the discriminator by equating the whole equation to 1 instead of zero. D(G(z)) = 1.
  3. The whole process is repeated until the generator fools the discriminator.

It has been given to understand that while training a GAN if the discriminator is unable to learn high dimensional spaces of the given data (or distribution), then the generator networks also fail to learn the multimodal structure of the target distribution.

Essentially the whole system is dependent on the discriminator if the discriminator is able to extract patterns then the generator can produce good samples. And this is a persisting challenge — the performance control of the discriminator.

As Takeru Miyato et al., (2018) points out:

“Once such discriminator is produced in this situation, the training of the generator comes to complete stop, because the derivative of the so-produced discriminator with respect to the input turns out to be 0”

In order to develop a good GAN, we should find a way to control the discriminator during the training, therefore developing a way to normalize the discriminator.

Spectral normalization

A traditional normalization usually follows the Lipschitz continuous function. Generally, we can define the discriminator as D(x, p) where is the parameter that needs to be optimized. Therefore, D(x,p) = A(f(x,p)) where A is the nonlinear activation function. A is also some continuous function that ranges from [0,1], let’s say a sigmoid function.

But as we discussed earlier that during the training if the derivatives of the discriminator turn out to be 0 then the generator tends to stop optimizing. 0 means that the Lipschitz continuity has stopped and that the signal cannot propagate any further.

In order to make the derivative from dying, we have to keep adding some regularity condition to the derivative of f(x). While input-based regularizations allow for relatively easy formulations based on samples, they make the model suffer from overfitting.

Spectral normalization addresses this issue by normalizing the weight matrix instead of normalizing the function f(x).

The spectral norm of a matrix is the maximum singular value. Takeru Miyato et al., (2018) propose finding the spectral norm of weight matrix W in each layer, then dividing W by its spectral norm to make it close to 1 i.e. making it a convex optimization problem.

With spectral normalization, we tend to make the derivatives of the GANs continuous and thus avoid vanishing gradients.

Python Construction

import torchimport torch.nn as nnn_pow_iter = 1
y = nn.Linear(3,3)y = nn.utils.spectral_norm(y, n_power_iterations=n_pow_iter = 1)_ = y(torch.randn(1,3)) # test svd vs. spectral_norm u/v estimates#SVD Approachu,s,v = torch.svd(y.weight_orig)cos_err_u = 1.0 — torch.abs(torch.dot(y.weight_u, u[:, 0])).item()cos_err_v = 1.0 — torch.abs(torch.dot(y.weight_v, v[:, 0])).item()#----------print(‘u-estimate cosine error:’, cos_err_u)
print(‘v-estimate cosine error:’, cos_err_v) # singular values
actual_orig_sn = s[0].item()approx_orig_sn = (y.weight_u @ y.weight_orig @ y.weight_v).item()#-----------print(‘Actual original spectral norm:’, actual_orig_sn)print(‘Approximate original spectral norm:’, approx_orig_sn) u,s_new,v = torch.svd(y.weight.data, compute_uv=False)actual_sn = s_new[0].item()print(‘Actual updated spectral norm:’, actual_sn)print(‘Desired updated spectral norm: 1.0’)

It is worthing that singular value decomposition can be used to compute the spectral norm, but it will be computationally inefficient. In order to tackle that we use the power iterative method to estimate the spectral norm.

References:

  1. Spectral normalization for GANs (Takeru Miyato et al., (2018))
  2. Code was taken from Stack Overflow by @jodag

--

--

Nilesh Barla
PerceptronAI

Founder @PerceptronAI who loves to research, build and teach. At times I paint, play guitar and run.