Crowd Density Forecasting by Modeling Patch-based Dynamics (IEEE RA-L)

Ryo Yonetani
OMRON SINIC X
Published in
3 min readDec 21, 2020

We are excited to announce that our recent work on crowd density forecasting has been published in IEEE Robotics and Automation Letters!

Hiroaki Minoura, Ryo Yonetani, Mai Nishimura, and Yoshitaka Ushiku, “Crowd Density Forecasting by Modeling Patch-based Dynamics”, IEEE Robotics and Automation Letters, 2020 [IEEE Xplore] [YouTube]

This work has been done in collaboration with Hiroaki Minoura, an intern student from Machine Perception and Robotics Groups at Chubu University.

Background

Forecasting how people will move to control robots safely

Analyzing how people move in a physical environment is a fundamental task for various applications such as security and safe transportation. In the field of computer vision, researchers have actively been studying “trajectory forecasting”, a technique to predict how pedestrians will move based on their past trajectory in videos. Trajectory forecasting techniques could be useful for controlling mobile robots or autonomous vehicles while safely avoiding collisions with surrounding people, and will ultimately play a crucial role in the near future where people and robots live and work together.

From trajectory forecasting to crowd density forecasting

A typical approach to trajectory forecasting can be summarized as follows:

  1. Detect and track people in videos to generate their trajectory from the past to the present frames,
  2. Feed those trajectories to a forecasting model (e.g., LSTM) to predict future trajectories from the present to the future frames.

With this approach, much recent work has particularly focused on how people move while interacting with others nearby or how environmental factors, such as roads and buildings, affect people’s trajectories. Importantly, these works require people detection and tracking to be accurate and stable. We argue that this requirement however prevents us from applying the existing trajectory forecasting methods to a crowded environment where people are often heavily occluding each other in videos.

Crowd density forecasting. From the crowd density maps extracted from past to present frames, we predict where will be crowded in the subsequent frames.

So we address a different problem: forecasting where will be crowded, rather than locations of every single person, in future video frames. In fact, accurate locational information is not necessarily required for applications, e.g., mobile robot navigation by forecasting future free spaces. As shown in the figure above, we train a spatio-temporal convolutional neural network receiving a history of crowd density maps extracted from the past to the present frames, in order to predict the subsequent maps for future frames. Doing so allows us to forecast future crowded regions while bypassing accurate pedestrian detection and tracking.

Proposed network architecture. Please refer to the paper for more details.

Some visual results are uploaded to YouTube:

What’s next

At OMRON SINIC X, we will continue fundamental research on computer vision, machine learning, and robotics. If you are interested in working with us as an intern, send us your application at internships@sinicx.com and get in touch!

Relevant posts:

--

--

Ryo Yonetani
OMRON SINIC X

Research Scientist at CyberAgent AI Lab. Ex-Principal Investigator at OMRON SINIC X, Japan