Learning Day 40: DCGAN and WGAN
Published in
2 min readMay 25, 2021
Deconvolutional GAN
Transposed Convolution
- If we have noticed, usual convolutional layer will shrink the image size (w, h), at most keeping the size the same through the means of padding
- Transposed convolution can enlarge the image size
Training stability is an issue
- Pg (generated data) and Pr (real data) are hardly overlapped
- When Pg and Pr is a certain distance apart (not even that far), both of the gradient of KL and JS Divergency are close to zero (gradient vanishing)
- Thus it cannot learn anything if the starting Pg does not overlap with Pr
- Instead, use Earth Mover’s Distance to calculate the least number of moves required to convert the generated data distribution to real data distribution
Wasserstein Distance and WGAN
- A continuous version of Earth Mover’s Distance
- WGAN is a GAN that uses Wasserstein Distance instead of JS Divergence for optimization
- WGAN can be optimized even Pr and Pg are not overlapped
- WGAN-Gradient Penalty (WGAN-GP) tries to fulfil 1-Lipschitz function for gentle gradient, making the training more stable
DCGAN vs WGAN-GP
- DCGAN requires Pr and Pg to be overlapping or close to each other, so that gradient vanishing will not occur. If training is carefully designed, accuracy can be good
- WGAN does not require the above, training is more stable. But calculation is more complicated that may reduce the model performance