Creating a Direction-Sensitive People Counter with OpenCV and MobileNetSSD

Pooja Vinod
Secure and Private AI Writing Challenge
5 min readAug 19, 2019

Measuring the net flow of visitors into an area and distinguishing the direction they are walking in, with Computer Vision and Deep Learning

This article is part of a series of research-based blogs we are creating for The Crowd Density Project, a project initiative at #sg_wonder_vision at SPAIC 2019.

Firstly, let us understand the difference between Object Detection and Object Tracking:

Object Detection:

  • Determining where in a frame(every video is a succession of frames) an object is.
  • More computationally expensive than object tracking
  • Examples: Haar cascades, HOG + Linear SVM, and deep learning-based object detectors such as Faster R-CNNs, YOLO, and Single Shot Detectors (SSDs)

Object Tracking:

  • Accepts the input (x, y)-coordinates of where an object is in an image and will:
  1. Assign a unique ID to that particular object
  2. Track the object as it moves around a video stream, predicting the new object location in the next frame based on various attributes of the frame (gradient, optical flow, etc.)
  3. Examples: MedianFlow, MOSSE, GOTURN, kernalized correlation filters, and discriminative correlation filters.

The code I used for implementing this project applies a hybrid approach of combining object detection and tracking to achieve the required results. Such hybrid people counters normally do detection and tracking in phases. Since detection is computationally heavy, it is done only once every N frames.

The object detection phase that is run every N frames does the following:

(1) detect if new objects have entered our view, and

(2) see if we can find objects that were “lost” during the tracking phase.

For each detected object we create or update an object tracker(which is much faster) with the new bounding box coordinates.

This alternation of detection and tracking phases continues.

This implementation uses the centroid tracking algorithm, to effectively track the people in the visuals.

A brief overview of the Centroid Tracking Algorithm:

In digital image processing, the bounding box is merely the coordinates of the rectangular border that fully encloses a digital image when it is placed over a page, a canvas, a screen or other similar bi-dimensional background.

At Step #1 we accept a set of bounding boxes and compute their corresponding centroids (i.e., the center of the bounding boxes):

During Step #2 we compute the Euclidean distance between any new centroids (yellow) and existing centroids (purple):

The centroid tracking algorithm makes the assumption that pairs of centroids with minimum Euclidean distance between them must be the same object ID.

In the example image above we have two existing centroids (purple) and three new centroids (yellow), implying that a new object has been detected (since there is one more new centroid vs. old centroid).

The arrows then represent computing the Euclidean distances between all purple centroids and all yellow centroids.

Once we have the Euclidean distances we attempt to associate object IDs in Step #3:

In this figure, you can see that our centroid tracker has chosen to associate centroids that minimize their respective Euclidean distances.

But what about the point in the bottom-left?

It didn’t get associated with anything — what do we do?

To answer that question we need to perform Step #4, registering new objects:

In our object tracking example, we have a new object that wasn’t matched with an existing object, so it is registered as object ID #3.

Registering simply means that we are adding the new object to our list of tracked objects by:

  1. Assigning it a new object ID
  2. Storing the centroid of the bounding box coordinates for the new object

In the event that an object has been lost or has left the field of view, we can simply deregister the object (Step #5).

In order to track and count an object in a video stream, we need an easy way to store information regarding the object itself, including:

  • It’s object ID
  • It’s previous centroids (so we can easily to compute the direction the object is moving)
  • Whether or not the object has already been counted

To accomplish all of these goals we can define an instance of TrackableObject, whose constructor accepts an objectID + centroid and stores them. The centroids variable is a list because it will contain an object’s centroid location history.

The constructor also initializes counted as False , indicating that the object has not been counted yet.

These are some of the key concepts in this implementation.

https://drive.google.com/open?id=1xB9FA0hkU-ns3o5Zs4uCwAwZOY8uCWvA

This is the output I obtained after implementing this project. As you can see, the yellow line is something we have drawn through our code, in the middle of the frame. This is our reference. Based on the movement of the centroid, the following steps are done:

  1. Checking if the direction is negative (indicating the object is moving Up) AND the centroid is Above the centerline. In this case we increment totalUp .
  2. Or checking if the direction is positive (indicating the object is moving Down) AND the centroid is Below the centerline. If this is true, we increment totalDown .

Thus we get the count of people walking up and down, w.r.t the reference line.

We obtain 34 FPS throughout rate(which is quite good) through our two-phase process of:

  1. Detecting people once every 30 frames
  2. And then applying a faster, more efficient object tracking algorithm in all frames in between.

This kind of a people counter can be used to measure the net flow of visitors to an event, and can be run on overhead camera visuals captured from above doors/entrances/exits. It can be used to determine which sub-events were popular at conferences/festivals, which can be used to plan ahead for future editions of the event.

Also, if this kind of footage were to come in from multiple areas of a big venue, the security forces could keep a track of which areas are crowded, and possibly help judge the risk of stampedes and help take precautionary measures to prevent them. Thus, a federated or decentralized approach can undoubtedly be applied with this implementation.

You can see the code at our GitHub repository for The Crowd Density Project: https://github.com/poojavinod100/People-Counting-Crowd-Density-Detection

This implementation needs OpenCV, Numpy, dlib and imutils. Installing dlib can be a real challenge, and I spent a lot of time struggling with this step(several varieties of possible dependency issues). I would advise that you follow this tutorial step-by-step to achieve dlib installation easily: https://www.youtube.com/watch?v=HqjcqpCNiZg

Reference:

pyimagesearch.com(OpenCV People Counter)

--

--