A Crowd is a gathering of numbers of people at some place. It is not feasible to count or monitor all the people at various places like university, hopping malls, railway stations, airports or at any other place by looking at them. To solve this, we proposed a novel Fully Convolutional Networks (FCN) model to count crowd. Our model is end-to-end learned on image patches, which functions as pixel-wise regression for crowd density distribution. The size of the output density map is same as the input, which will greatly increase the precision compared with previous works.
An unbiased density ground truth generation method is used to handle the
problem of scene perspective distortions. When the model is trained, it can
output a density map for a given image patch. Here the position of each
pedestrian is labeled by dot at the center of every head, the density ground truth D is obtained by creating a sum of Gaussian kernel at the center of every head. The number of crowd is calculated by integrating the estimated density map.

Density estimation based methods have two prominent advantages. One is they can use more spatial information by pixel-wise regression. Another one is they could get crowd distribution information of the given images. Also the experiment results demonstrate that our crowd counting method achieved the best accuracy. In addition, our method can be extended to other object counting tasks or density estimation based tasks.

Author: Anaswara Asokan, Member, AI Club Vast