Member-only story
Understanding MixNMatch: Creating A More Realistic Synthetic Image
Combine different factors from multiple real images to a single synthetic image
I recently stumbled upon this paper called MixNMatch that aims to combine different factors from multiple real images to a single synthetic image — with minimal supervision. This post is intended to be detailed and requires some background in Deep Learning and Generative Models. If you are looking for a TLDR; version of it, you can checkout my twitter thread here.
If you like this post, please share with your network.
Summary
At its core, MixNMatch is a conditional image generation techniques using conditional Generative Adversarial Network (GAN). MixNMatch disentangles and encodes multiple factors from different real images to a single synthetic image. Specifically — it combines image background, pose, shape and texture from different real images to a single synthetic image with minimal supervision.
During training, MixNMatch only needs a loose bounding box around the object to model the background but none for object’s pose, shape or texture.