NVIDIA & MIT CSAIL Open-Source Video-to-Video Synthesis Method

Nvidia and the MIT Computer Science & Artificial Intelligence Laboratory (CSAIL) have open-sourced their video-to-video synthesis model. By using a generative adversarial learning framework, the method can generate high-resolution, photorealistic and temporally coherent results with various input formats, including segmentation masks, sketches, and poses.

Compared to image-to-image translation, there has been less research into video-to-video synthesis. To solve the problem of low visual quality and incoherency of video results in existing image synthesis approaches, the research group proposes a novel video-to-video synthesis approach capable of synthesizing 2K resolution videos of street scenes up to 30 seconds long.

The team’s 2017 Image Synthesis and Semantic Manipulation research
Sketch-to-face video results
Pose-to-dance video results

The authors performed extensive experimental validation on various datasets and the model showed better results than existing approaches from both quantitative and qualitative perspectives. In addition, when the team extended the method to multimodal video synthesis with identical input data, the model produced new visual properties in the scene, with high resolution and coherency.

Multi-modal video synthesis results. (These synthesized videos contain different road surfaces.)

Researchers suggest the model may be improved in the future by adding additional 3D cues such as depth maps to better synthesize turning cars; using object tracking to ensure an object maintains its colour and appearance throughout the video; and training with coarser semantic labels to solve issues in semantic manipulation.

The Video-to-Video Synthesis paper is on arVix, the team’s model and data are here.


Author: Victor Lu | Editor: Michael Sarazen


Follow us on Twitter @Synced_Global for more AI updates!


Subscribe to Synced Global AI Weekly to get insightful tech news, reviews and analysis! Click here !


We are honored to have Fisher Yu , the post-doctoral researcher at UC Berkeley, as our guest speaker in DTalk E4 “@UCBerkeley DeepDrive and Its Driving Data Efforts”. Sign up at https://goo.gl/fqHibG and learn about their recent achievements of self-driving technologies!