NeuroNuggets: CVPR 2018 in Review, Part III

Published in

Neuromation

6 min readOct 15, 2018

The CVPR 2018 (Computer Vision and Pattern Recognition) conference is long over, but we can’t stop reviewing its wonderful papers; today, Part III is upon us! In the first part, we briefly reviewed the most interesting papers on GANs for computer vision from CVPR 2018; in the second part, added a human touch and talked about pose estimation and tracking for humans. Today, we turn to one of the main focal point of our own internal research here at Neuromation: synthetic data. As usual, the papers are in no particular order, and our reviews are very brief, so we definitely recommend to read the papers in full.

Synthetic data: imitate to learn

Synthetic data means data that has been generated artificially, either through 3D modeling and rendering (as usual for computer vision) or by other means, and then used to train machine learning models. Synthetic data is a surprising topic in machine learning, and the most surprising thing is for how long it had been mostly neglected. Some works on synthetic data can be traced to the 2000s, but before 2016 it basically attracted no interest at all. The only field where it had been used was to train self-driving cars, where the need for simulated environments and the impossibility to collect real datasets come together and make it the perfect situation for synthetic datasets.

Now the interest is rapidly growing: we now have the SUNCG dataset of simulated indoor environments, outdoor environments for driving and navigation, the SURREAL dataset of synthetic humans to learn pose estimation and tracking, and even recent works that apply GANs to generate and refine synthetic data (we hope to get back to this and explain how it works later). So let us see what CVPR 2018 authors have to say about synthetic data. Since this is our main focus, we will consider the works on synthetic data in slightly more detail than usual.

Generating Synthetic Data from GANs: Augmentation and Adaptation in Feature Space

R. Volpi et al., Adversarial Feature Augmentation for Unsupervised Domain Adaptation
S. Sankaranarayanan et al., Generate To Adapt: Aligning Domains using Generative Adversarial Networks

There is a very interesting and promising field of using GANs to produce synthetic datasets to train other models. On the surface it makes little sense: if you have enough data to train a GAN, why not just use it to train the model? Or even better, if you have a trained GAN why don’t you just take the discriminator and use it for your problem?

But this idea becomes much more interesting in the domain adaptation setting. Suppose you have a large source dataset and a small target dataset, and you need to use a model trained on the source dataset for the target, which might be completely unlabeled. Here adversarial domain adaptation techniques train two networks, a generator and a discriminator, and use it to ensure that the network cannot distingush between the data distributions in the source and target datasets. This field was started in the ICML 2015 paper by Ganin and Lempitsky, where the discriminator is used to ensure that the features stay domain-invariant:

And here is a schematic depiction of how this idea was slightly generalized in the Adversarial Discriminative Domain Adaptation paper from 2017:

In the CVPR 2018 paper by Volpi et al., researchers from Italy and Stanford made the adversarial training work not on the original images but rather in the feature space itself. The GAN operated on features extracted by a pretrained network, which makes it possible to achieve better domain invariance and ultimately improve the quality of domain adaptation. Here is the overall training procedure as it was adapted by Volpi et al.:

Another approach in the same vein was presented in CVPR 2018 by Sankaranarayanan et al., researchers from the University of Maryland. They use GANs to leverage unsupervised data to bring the source and target distributions closer to each other in the feature space. Basically, the idea is to use the discriminator to control that images generated from an embedding remain realistic images for the source distribution even when the embedding was taken from a sample from the target distribution. Here is how it works, and, again, the authors report improved domain adaptation results:

How Well Should You Label? A Study of Label Quality

A. Zlateski et al., On the Importance of Label Quality for Semantic Segmentation

One of the main selling points of synthetic data has always been the pixel-perfect quality of labeling that you can easily achieve with synthetic data. A synthetic scene always comes with perfect segmentation — but just how important is it? The authors of this work studied how fine (or how coarsely) you have to label your training set to get good segmentation quality from modern convolutional architectures… and, of course, what better tool to perform this study than synthetic scenes.

The authors used their specially developed Auto City dataset:

And in their experiments, the authors showed that the final segmentation quality, unsurprisingly, is indeed strongly correlated with the amount of time spent to produce the labels… but not so much with the quality of each individual label. This suggests that it is better to produce lots of coarse labels (say, with crowdsourcing) than to perform strict quality control for every label.

Soccer on Your Tabletop

K.Rematas et al., Soccer on Your Tabletop

Here at Neuromation, we love soccer (yes, the World Cup in Russia cost us a lot of work hours), and this research is just soooooooo cool. The authors present a system that can take a video stream of a soccer game and transform it… into a moving 3D reconstruction that can be projected onto your tabletop and viewed with an augmented reality device!

The system extracts bounding boxes of the players, analyzes the human figures with pose and depth estimation models and produces a quite accurate 3D scene reconstruction. Note how training a model specifically for the soccer domain really improves the results:

It additionally warms our hearts that they actually trained on synthetic data extracted from FIFA games! And the results are simply very cool all around:

But wait, there is more…

Thank you for your attention! Next time we might take an even more detailed look at some of the CVPR 2018 papers regarding synthetic data and domain adaptation. Until then!

Sergey Nikolenko
Chief Research Officer, Neuromation

Aleksey Artamonov
Senior Researcher, Neuromation