CVPR’20: The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
In this post, I will present our CVPR’20 paper for the multi-future trajectory prediction task [1]. [Dataset/Code/Model]
Person future trajectory prediction: can you predict where is the person going to go?
In this paper, we study the problem of multi-future trajectory prediction. As shown in the following example, the person is likely to walk in multiple directions.
The Forking Paths Dataset
In real-world videos, only one possible trajectory is available for the same scenario. (We can only see and experience a single universe.)
In order to provide a quantitative evaluation of multi-future trajectory prediction, we create a trajectory dataset using a realistic simulation environment, where the agents are controlled by human annotators, to create multiple semantically plausible future paths, given the same scenarios.
First we re-create the static scene and dynamic trajectories from real-world videos into 3D simulation.
Multiple human annotators observe the scenario for a period of time, and then they are asked to control the agent to go to a destination in first or third-person view. The idea is that by reconstructing the real-world scenarios into 3D simulation and then asking human annotators to control the agents to navigate, we could record human behaviors that resemble the real-world.
Here is a visualization of the dataset:
After annotating the multi-future trajectories, we record the scenarios from different camera views, and even different weather and lighting conditions.
We have released the dataset, all the code and 3D assets here, which includes a detailed tutorial of using the simulator and creating the dataset.
The Multiverse Model
We propose a multi-decoder framework that predicts both coarse and fine locations of the person using scene semantic segmentation features.
- History Encoder computes representations from scene semantics
- Coarse Location Decoder predicts multiple future grid location sequences by using beam search
- Fine Location Decoder predicts exact future locations based on the grid predictions
- Our model achieves STOA performance in the single-future trajectory prediction experiment and also the proposed multi-future trajectory prediction on the Forking Paths Dataset.
Qualitative analysis with the popular Social-GAN [2] model:
Now, back to the example at the beginning, did you get it right?
Check out our Social-Distancing-Early-Forecasting system!
References:
[1] Liang, Junwei, Lu Jiang, Kevin Murphy, Ting Yu, and Alexander Hauptmann. “The garden of forking paths: Towards multi-future trajectory prediction.” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. [Dataset/Code/Model]
[2] Gupta, Agrim, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. “Social gan: Socially acceptable trajectories with generative adversarial networks.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.