Synthetic datasets are a game changer

Arjan Wijnveen
5 min readJan 29, 2017

--

Everyone understands that datasets are a key component when it comes to deep learning. The success of many projects depends on the availability, accuracy and diversity of the data. Google researchers went as far as to say that even mediocre algorithms received state-of-the-art results given enough data. Unfortunately most public datasets lack those essential qualities and are either too small, not reviewed or biased.

Major players like Google and Microsoft try to alleviate some of those problems by making available annotated datasets. However those attempts feel more like a showcase of corporate responsibility and PR than anything else. As an example Google announced in 2015 record breaking results on facial recognition using a quarter billion annotated faces (yes we’d love a copy of that!), numbers that dwarf their 8M public dataset.

Luckily there are recent successes using a new technique called ‘synthetic datasets’ that could see us overcome those limitations. This new type of dataset consists of images and videos that are solely rendered by computers based on various parameters or scenarios. The process through which those datasets are created fall into 2 categories: Photo realistic rendering and Scenario rendering for lack of better description.

Photorealistic Rendering

Photo realistic rendering aims to produce images of such high quality that they are indistinguishable from actual photos. These rendering can then be used to depict various situations or objects while having full control over the scene, lighting conditions and camera angles.

Example of synthetic faces

A paper published in 2016 describes how this process was applied to facial recognition. By having full control over the parameters it’s possible to eliminate issues like racial, age and gender bias. Using a 3D model of a human head and applying random textures, hairstyles and face morphology they were able to generate millions of unique ‘identities’ that could then be presented to a neural network for training. Since each identity would come in various angles and conditions the network was forced to generalize just like it would on the Faces in the wild dataset. Depending on the quality of the renderings the resulting model could be used directly on actual human faces or be used as a base for fine tuning.

Scenario rendering

Scenario rendering on the other hand relies less on the visual quality of the produced output but more on depicting complex or dangerous situations that are not easily recorded in real life. Think for example head-on collisions between cars, near misses in aviation, natural disasters or public violence. By using Unity3D (the same engine that powers most games on your PC, phone or tablet) developers and scientists are able to hand craft scenarios and environments from which hours of footage can be extracted.

Example segmentation map from SYNTHIA Dataset

Unfortunately the photo realistic rendering of video footage is a very GPU intensive process and could take weeks of resources to produce. Luckily in most cases we are able to level the playing field between virtual and real-world footage by using segmentation maps. Segmentation maps are merely colored representations of an actual image or video foregoing the need for realism. Since Unity3D offers full control on the used textures it’s possible to directly output those simplified depictions. The actual real world footage can be transformed to the same level using a traditional CNN or GAN trained on static annotated images.

One such example is the SYNTHIA Dataset (shown above) which sees cars, pedestrians and cyclist interact in a virtual city. Various weather conditions and types of on-board cameras provide a rich source of training data.

Reinforcement Learning

Unity3D is however able to provide us with much more than just visual data, namely telemetry. Given the total control over the rendering engine’s internals it’s possible to export a stream of simulated measurements like steering wheel angles, velocity, altitudes and G forces alongside the produced videos. This provides invaluable data for a popular branch of deep learning called Reinforcement Learning. But as you can imagine building realistic simulations solely for the purpose of deep learning is another matter altogether.

DeepDrive Universe example on GTAV

One solution to this can be found on OpenAI’s platform called Universe. Hosting dozens of existing popular PC games in the cloud, scientists are able to record game play footage with the accompanying commands from the input control (mouse/keyboard). For most games the only application would be the training of Agents that would let a computer play with the same latency and information available to a human player. However some games like GTA5 or Flight simulators have such a great level of realism that its feasible to fine-tune those Agents to real world applications limiting the dependency on actual footage.

From data source to training set

All of the outlined techniques give an unprecedented control over the quality and diversity of the generated data. However the shear volume of videos, images and telemetry presents a significant hurdle when trying to convert this data into a format understandable by machine learning frameworks.

The soon to be released free platform, Cvedia, tries to bridge this gap. By offering content ingestion features directly from Unity3D, OpenAI and 3D rendering engines it becomes trivial to manage this flow of data. With support for most popular frameworks like TensorFlow, MXNet, Caffe and Torch users can export their data in any format undergoing any transformation without the need to write a single line of code or cut a single video.

Through the use of a Query Language data scientists can extract specific sections from videos based on their own custom telemetry and rules. For example, this makes it possible to export only scenes that contain a minimum amount of steering angle or change in altitude.

Conclusion

Through continued research and investment in this field we will most likely see many more interesting initiatives aimed at dataset handling and dependency reduction. But as always, our ever increasing expectation of technology goes up accordingly.

--

--