GauGan: Generating Photorealistic Images from Drawings

Published in

Analytics Vidhya

3 min readMay 6, 2022

Paper To Code implementation of NVIDIA’s GauGan on a custom Landscape ‘s Dataset. Generating photorealistic-ish:p images from drawings

Generative Adversarial Networks (GAN) have shown to be extremely powerful in producing amazing generative models that can make music, write poetry and even generating incredibly real-looking faces.

“the most interesting idea in the last 10 years in Machine Learning” — Yan LeCunn

GANs have been very popular in the past few years, since it was introduced by Ian Goodfellow in 2014.

I believe one of the main factors contributing to GAN’s popularity (besides its effectiveness duh) is the simplicity and intuitiveness of its design.

A GAN generally comprises 2 Neural Networks:

1. Generator

2. Discriminator

The training process can be thought of as “a competition between counterfeiters and police,” Goodfellow said. “Counterfeiters want to make fake money and have it look real, and police want to look at any particular bill and determine if it’s fake.”

Here, the generator is the counterfeiter that tries to generate some fake data. And the discriminator learns how determine if the generated data is fake. This process happens iteratively, where both the discriminator and generator become progressively better at their jobs. I,e, the discriminator becomes very good at identifying fake data. And this would push to generator to make data that seems very real.

In this article I will be demoing a custom implementation of NVIDIA’s Gaugan. My approach was to wast to read the paper (Semantic Image Synthesis with Spatially-Adaptive Normalization) and try to build the model as described in the paper. I have used pytorch for the implementation

GauGan

Gaugan uses a special normalization technique for improving the quality of the data. The generator is capable of taking as input a semantic map (a drawing) and generating a photorealistic image as the output. Further it is also capable of multimodal image synthesis — which means, it can generate images in various different styles. So for the same drawing, it can generate multiple images.

Please refer my Github for full details of the architecture and implementation.

In my implementation, I downloaded a dataset of landscape images from kaggle and used a pretrained semantic segmentation model (deeplab v2) to generate semantic maps of the image. This is how I compiled the dataset.

This dataset was then used to train gaugan. Below are some results:

This model was trained on a hastily put-to-gether dataset for limited time on my home pc. The results can be improved by properly curating the dataset and training the same over a much larger dataset.

Github Repo: https://github.com/kvsnoufal/GauGanPytorch

Shoulders of giants:

Semantic Image Synthesis with Spatially-Adaptive Normalization (Paper)
Official Github Implementation : https://github.com/NVlabs/SPADE
Implementation in Keras : https://keras.io/examples/generative/gaugan/
Flickr Landscape Dataset: https://www.kaggle.com/datasets/arnaud58/landscape-pictures
DeepLab model for semantic segmentation: https://github.com/kazuto1011/deeplab-pytorch

About The Author
I work in Dubai Holding, UAE as a Principal Data Scientist. You can reach out to me at kvsnoufal@gmail.com or https://www.linkedin.com/in/kvsnoufal/

GauGan: Generating Photorealistic Images from Drawings

GauGan

Written by Noufal Samsudin