Big RNNs Achieve SOTA Performance in Video Prediction

Synced
Synced
Nov 20, 2019 · 3 min read

Toss a ball to a human and they’ll find a multitude of ways to interact with it — kicking, dribbling, heading or even deflating the thing. It’s not as easy as one might imagine to train an AI model to accurately predict what a human will do next, even when they are interacting with a relatively simple object like a ball.

When such next-state uncertainty presents in a video, the task of generating future frames given context frames — known as video prediction — is notoriously difficult. Along with the prediction process, there are also many spatio-temporal variables to deal with in video generation.

Most existing methods include handcrafted measures in the neural networks, for example to separate information streams, add high-level information like landmarks and semantic segmentation masks, or perform specialized computations like warping, optical flow, background masking, etc. Other methods have been applied in relatively simpler environments such as scenes that mainly involve synthetic shapes or human faces or bodies.

New research from the University of Michigan, Google, and Adobe Research questions whether such handcrafted architectures are necessary. In what they say is the first large-scale study of the effects of minimal inductive bias and maximal capacity on video prediction, the researchers show that it’s possible to generate high quality video prediction simply by increasing the scale of computation.

The researchers trained large models on three different datasets — one for modeling object interactions, another for modeling human motion, and a third for modeling car driving.

In their experiments they found that maximizing the capacity of a standard neural network can achieve higher quality video predictions. Recurrent models tend to outperform non-recurrent models, with large-scale recurrent neural networks (RNNs) achieving SOTA performance in predicting what’s coming next in videos. The researchers also noted that stochastic models perform better than non-stochastic models.

The authors are calling for further studies to help discover an ideal combination of minimal inductive bias and maximal model capacity for optimizing video prediction.

The paper High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks is on arXiv.

Journalist: Yuan Yuan | Editor: Michael Sarazen

We know you don’t want to miss any stories. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

SyncedReview

We produce professional, authoritative, and…

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Synced

Written by

Synced

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global

SyncedReview

We produce professional, authoritative, and thought-provoking content relating to artificial intelligence, machine intelligence, emerging technologies and industrial insights.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store