Markovian Generative Adversarial Networks (MGANs) capture the feature statistics of Markovian Patches and generate images of arbitrary dimensions. And since Generative Adversarial Networks have less technical debt compared to conventional deep learning-based approaches the runtime performance in inference time is 500 times faster for style transfer and texture synthesis tasks.
The image generation process when guided with statistics of feature vectors obtained in the bottleneck layers tends to result in better image synthesis. MGAN operates by calculating statistics over spatial patches, unlike conventional approaches which do the same feature-wise. In addition to that, they move away from the gaussian normality assumption over the latent features. That is because real-world data are usually sampled from a complex non-linear manifold. Then the goal is to project the related patches to that manifold.
The MGAN model starts with an input image to be stylized which goes through the VGG network for feature map extraction, the VGG is frozen and is not trainable. Then, the generator reconstructs the target stylized image. The generated images go through the discriminator that has two parallel paths. The first path works on patches and classifies them as either fake or real while the second path works on the whole image and measures its proximity to the ground truth. Two remarks rise here, the first is that the authors favored Hinge Loss over Sigmoid Binary Cross Entropy as they didn’t have a large training data set, only 25K images, so the Sigmoid would be saturated and suffer from vanishing gradients. The second idea is to use the Mean Square Error MSE loss to guide the stylized images to be consistent with the input image.
The discriminator models the image as a Markov Random Field (MRF) assuming the independence between the pixels that are not in the same locality/patch. The concept is utilized in the popular Pix2Pix-GAN and is called PatchGAN. PatchGANs are still robust and functional for smaller patch sizes that is in terms of width and height. Thus results in fewer trainable parameters and faster performance. On top of that, it can work on arbitrarily large images.
- Precomputed Real-Time Texture Synthesis with Markovian Generative Adversarial Networks, Chuan Li and Michael Wand, 2016
- Image-to-Image Translation with Conditional Adversarial Networks, Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros, 2016
- Source Code https://github.com/chuanli11/MGANs