Learning Day 40: DCGAN and WGAN

Published in

dejunhuang

2 min readMay 25, 2021

--

Deconvolutional GAN

Transposed Convolution

If we have noticed, usual convolutional layer will shrink the image size (w, h), at most keeping the size the same through the means of padding
Transposed convolution can enlarge the image size

Transposed convolution different padding strategies (ref)

Training stability is an issue

Pg (generated data) and Pr (real data) are hardly overlapped
When Pg and Pr is a certain distance apart (not even that far), both of the gradient of KL and JS Divergency are close to zero (gradient vanishing)

P here is Pr representing the real data; q1, q2 and q3 here are representing different examples of generated data, Pg (ref)

Thus it cannot learn anything if the starting Pg does not overlap with Pr
Instead, use Earth Mover’s Distance to calculate the least number of moves required to convert the generated data distribution to real data distribution

Example of Earth Mover’s Distance (ref)

Wasserstein Distance and WGAN

A continuous version of Earth Mover’s Distance
WGAN is a GAN that uses Wasserstein Distance instead of JS Divergence for optimization
WGAN can be optimized even Pr and Pg are not overlapped
WGAN-Gradient Penalty (WGAN-GP) tries to fulfil 1-Lipschitz function for gentle gradient, making the training more stable

DCGAN vs WGAN-GP

DCGAN requires Pr and Pg to be overlapping or close to each other, so that gradient vanishing will not occur. If training is carefully designed, accuracy can be good
WGAN does not require the above, training is more stable. But calculation is more complicated that may reduce the model performance

Reference

Machine Learning

De Jun Huang

Written by De Jun Huang

Editor for

dejunhuang

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams