Learning Day 40: DCGAN and WGAN

De Jun Huang
dejunhuang
Published in
2 min readMay 25, 2021

Deconvolutional GAN

Transposed Convolution

  • If we have noticed, usual convolutional layer will shrink the image size (w, h), at most keeping the size the same through the means of padding
  • Transposed convolution can enlarge the image size
Transposed convolution different padding strategies (ref)

Training stability is an issue

  • Pg (generated data) and Pr (real data) are hardly overlapped
  • When Pg and Pr is a certain distance apart (not even that far), both of the gradient of KL and JS Divergency are close to zero (gradient vanishing)
P here is Pr representing the real data; q1, q2 and q3 here are representing different examples of generated data, Pg (ref)
  • Thus it cannot learn anything if the starting Pg does not overlap with Pr
  • Instead, use Earth Mover’s Distance to calculate the least number of moves required to convert the generated data distribution to real data distribution
Example of Earth Mover’s Distance (ref)

Wasserstein Distance and WGAN

  • A continuous version of Earth Mover’s Distance
  • WGAN is a GAN that uses Wasserstein Distance instead of JS Divergence for optimization
  • WGAN can be optimized even Pr and Pg are not overlapped
  • WGAN-Gradient Penalty (WGAN-GP) tries to fulfil 1-Lipschitz function for gentle gradient, making the training more stable

DCGAN vs WGAN-GP

  • DCGAN requires Pr and Pg to be overlapping or close to each other, so that gradient vanishing will not occur. If training is carefully designed, accuracy can be good
  • WGAN does not require the above, training is more stable. But calculation is more complicated that may reduce the model performance

Reference

link1

--

--