Track Social Distancing Using Computer Vision

Michael Gorkow
3 min readApr 29, 2020

--

Due to the current Corona crisis many governments decided to implement restrictions for social distance. While a lot of people follow these rules, there are still people who ignore them for various reasons.

In this article I would like to show my way to track these social distancing rules using Computer Vision on camera images. The whole demo is available in my GitHub where you’ll also find a lot of comments inside of the code. Please note that the demo was developed using SAS Deep Learning capabilities for object detection and SAS Event Stream Processing for analyzing the video using the deep learning model and various open source libraries from Python, e.g. OpenCV and Scipy.

The first step for this task is of course to detect people in images. I’ve used a Tiny Yolo V2 model for this but every other object detection model would work. The decision for Tiny Yolo V2 was mainly driven because of its ease to use in SAS. If you’re interested in the training process have a look at this Jupyter Notebook.

In the second step we calculate distances between all objects. This sounds easy but can become more complicated if you really think about it. Usually a camera does not provide a top-view but has some angle which leads to certain perspective. This perspective can be very important when you want to calculate real world distances in images. Have a look at the following three example pictures showing the transformation process.

Using Homography to remove camera perspective and allow real world distance calculation.
Transformation Process: From Camera Image to 2D Representation using Homography

You can clearly see that the transformation worked because we get a nice rectangle in the second image which was expected when we look at the tiles. However the distance is not correct due to the camera perspective. In picture three I adapted the distances from the original image with my transformation matrix. A distance of 356 pixels on my camera image shrinks down to only 149 pixels while a distance of 62 pixels in the back are 83 pixels in reality.

This article is not going to cover the details of this process but if you’re curious what’s happening in the background I recommend to have a look at the basic concepts of Homography.

Now that we have detected people and transformed their distances to others we want to find out whether we have crowds in our image. In my demo I am using KD-Trees from Scipy to allow a look up of nearest neighbors given a detected person and a maximum radius. The following functions were used:

cKDTree to efficiently calculate nearest neighbours.

query_ball_tree to query the KDTree with a person and a given maxium radius.

Now I have all parts together to put this into a streaming process which can connect to any kind of video data source. I am using SAS Event Stream Processing for this which lets you define the process either graphically or programmatically via a Python Interface.

The process can be visualized and looks as follows:

SAS Event Stream Processing Process

The first two boxes will load the trained Tiny YOLO V2 model and provide the model to the scoring window. The scoring window receives images that are resized to appropriate dimensions (416x416 pixels in this case). The scoring window provides the detected persons and their corresponding x, y, width and height values.

The last window utilizes Python inside SAS Event Stream Processing to transform the coordinates given the homography matrix. Additionally it uses Scipy to perform crowd detection.

The final result looks as follows:

Michael Gorkow | Data Scientist @ SAS Germany & CV-Enthusiast

The whole project can be found in my GitHub repository.

--

--

Michael Gorkow

Field CTO Datascience @Snowflake Germany | Passionate about all things related to data science! | Personal Account, Contents are my own thoughts!