BachGAN: Let your imagination run free

Create a masterpiece without a pencil or paintbrush

Rohit Pillai
The Startup
6 min readMay 15, 2020

--

Photo by Adli Wahid on Unsplash

Picture this: One day, you dream up an artwork so awe striking, filled with vibrant colors and beauty that evokes primal feelings in the beholder and can absorb them for hours. You wake up and jump out of bed filled with excitement, ready to bring this magical piece to life. Days later, you realize you’re no Picasso and your execution of this piece could never do justice to what you imagined. You leave your canvas frustrated at your inability to bring magic to the world. But what if this story could go another way? What if you didn’t have to draw this image but could have something draw it for you, something that could bring justice to your creation? Ladies and gentlemen, I present to you the magic of GANs.

That’s right, Generative Adversarial Networks (GANs) can take in an input and create something completely new or modify it for you. The GAN was first proposed in 2014 and since then it has been used for a multitude of tasks, a lot of which involve creating new images or modifying them. For context, a GAN consists of 2 components: a generator, which generates an image from random noise and the input and a discriminator, which predicts whether the image generated by the generator is an actual image or not. They both work together in coordination, helping each other get better. The generator is forced to generate better images so that it can fool the discriminator while the discriminator learns to notice even intricacies in an image that could make it “fake”.

Here are some examples of GANs that function on different inputs.

  • WarpGAN converts a photo of a face to a caricature while CP-GAN does the inverse, converting a cartoon image to a one of a real face.
  • SketchyGAN creates a realistic object or living thing from just a sketch of it.
  • StackGAN and AttnGAN creates images from a text description.
  • StoryGAN converts a story (generally a paragraph) into a series of images, one for each sentence in the story.
  • PG² uses a person’s picture and a pose map to generate images of the person standing in different positions.
  • Wav2Pix recreates a person’s face from just an audio description.
  • SPADE takes in a semantic segmentation map (an image which is filled with different colors, one for each object) and fills it with actual pictures of objects.
  • Layout2Im converts a layout (a blank image with boxes where objects should be both in the background and foreground) to a complete image.

These GANs are only the tip of the iceberg that is the work that is being done in the image generation and modification domain.

Now coming back to your masterpiece, you can use any of these GANs or even a combination of them to complete it. Is your masterpiece a breakdown of the human body? You can talk to Wav2Pix and have it create a great face! Why stop there? Convert it to a caricature with WarpGAN. You could even create a piece of someone in multiple poses superimposed on each other with PG2. If you want to create a scenic piece, describe it to either StackGAN or AttnGAN or write a story about it and StoryGAN will do the rest. Think you can sketch it out? Let SketchGAN flesh it out. Feeling lazy? Draw boxes where you want objects to be and let Layout2Im finish the job.

Feeling even lazier and don’t really want to imagine the background? Here’s the Microsoft Dynamics 365 AI Research team to the rescue. They proposed a new image generation task that requires a model to be able to generate an image from a layout that specifies boxes only in the foreground and no other additional information. Doing this is significantly harder than using richer information like a segmentation map, text or even a complete layout because the model has to be capable of understanding what sort of background cohesively combines all the objects in the image.

To solve this task, the team built the Background Hallucination Generative Adversarial Network (BachGAN). Its key components are a retrieval module and a fusion module, which when combined can generate a visually consistent background on-the-fly for any foreground object layout. Given a salient object layout, BachGAN generates an image via two steps:

(i) The background retrieval module selects a set of segmentation maps most relevant to the given object layout from a large candidate pool

(ii) These candidate layouts are encoded by the background fusion module to generate a best-matching background.

Images created using a segmentation map (SPADE, top row) vs foreground layout(BachGAN, bottom row).

BachGAN isn’t restricted to creating an image from just one foreground layout. In fact, it is capable of handling a modification to the layout provided to it and can regenerate a new background that cohesively ties in both the old objects as well as the new objects added to the foreground if necessary. It also maintains the positions of the old objects in the new image it generates so it looks like nothing has changed except the addition of the new object.

Sequential image generation by BachGAN on an example from Ade20k

Every model has to be tested on a dataset to prove its ability and BachGAN is no different. It was tested on the CityScapes dataset, which is composed of images of streets taken in cities, and the Ade20k dataset, which consists of a variety of scenes in different settings. BachGAN was able to outperform the baselines on both these datasets.

Sequential image generation by BachGAN on an example from CityScapes

With the creation of BachGAN, you now have no excuses to stop yourself from creating that masterpiece that you’ve always dreamed of. So go on and unleash all that untapped creativity!

Here’s a link to our paper if you want to know more about BachGAN or the new task and click here to see more of our publications and other work.

References

  1. Shi, Yichun, Debayan Deb, and Anil K. Jain. Warpgan: Automatic caricature generation. ICCV, 2019.
  2. Junhong Huang and Mingkui Tan and Yuguang Yan and Chunmei Qing and Qingyao Wu and Zhu Liang Yu. Cartoon-to-Photo Facial Translation with Generative Adversarial Networks. ACML, 2018
  3. Tao Xu, Pengchuan Zhang, Qiuyuan Huang, Han Zhang, Zhe Gan, Xiaolei Huang, and Xiaodong He. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. CVPR, 2018.
  4. Chen, Wengling, and James Hays. Sketchygan: Towards diverse and realistic sketch to image synthesis. ICCV, 2018.
  5. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. ICCV, 2017
  6. Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao. StoryGAN: A Sequential Conditional GAN for Story Visualization. CVPR, 2019
  7. Liqian Ma, Xu Jia, Qianru Sun, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. Pose guided person image generation. NIPS, 2017.
  8. Duarte A, Roldan F, Tubau M, et al. Wav2Pix: speech-conditioned face generation using generative adversarial networks. ICASSP, Vol. 3. 2019.
  9. Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic image synthesis with spatially-adaptive normalization. CVPR, 2019.
  10. Bo Zhao, Lili Meng, Weidong Yin, and Leonid Sigal. Image generation from layout. CVPR, 2019.
  11. Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. CVPR, 2016
  12. Bolei Zhou, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. Scene parsing through ade20k dataset. CVPR, 2017.

--

--

Rohit Pillai
The Startup

I’m an engineer at Microsoft Dynamics 365 AI Research and I’ll post our new NLP, CV and Multimodal research . Check out https://medium.com/@rohit.rameshp