Social Distancing Analyzer using OpenCV and YOLO

Sherwyndsouza
Analytics Vidhya
Published in
7 min readJun 22, 2020

Introduction

Social distancing is deliberately increasing the physical space between people to avoid spreading illness. Staying at least six feet away from other people lessens your chances of catching COVID-19. We can use OpenCV and YOLO to monitor/analyze whether people are maintaining social distancing or not.

Techniques and tools used

I used Python for this project. Some other tools I used were OpenCV and NumPy.

Theory

A little theory won’t hurt :)

OpenCV

So, if you don’t know what OpenCV is, OpenCV is a library of programming functions mainly aimed at real-time computer vision. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products. Being a BSD-licensed product, OpenCV makes it easy for businesses to utilize and modify the code.

The library has more than 2500 optimized algorithms, which includes a comprehensive set of both classic and state-of-the-art computer vision and machine learning algorithms.

For more info Click Here.

YOLO

YOLO(You Only Look Once) is a clever convolutional neural network (CNN) for doing object detection in real-time. The algorithm applies a single neural network to the full image, and then divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

YOLO is popular because it achieves high accuracy while also being able to run in real-time. The algorithm “only looks once” at the image in the sense that it requires only one forward propagation pass through the neural network to make predictions. After non-max suppression (which makes sure the object detection algorithm only detects each object once), it then outputs recognized objects together with the bounding boxes.

For more info Click here.

We will use the above two in our project extensively.

Overview

  • We will use YOLO for object detection.
  • Once the objects(people) are detected, we will then draw a bounding box around them.
  • Using the centroid of the boxes we then measure the distances between them.
  • For the distance measure, Euclidean Distance was used.
  • A box is colored RED if unsafe and GREEN if safe.
  • We will also count the number of people who are unsafe because they are not maintaining social-distancing.

Already interested? Let’s gets started with the fun part…

Project

  1. First, let’s see the project structure
Project Structure

2. Now for the video.mp4 file(input) Click here. Also you can download the YOLOv3 weights, configuration and COCO names from here:

3. Now after that is done, open up the constants.py and copy the following lines of code

Wait… What did I just copy?

Don’t Worry! This file just contains the absolute paths of the YOLO weights, cfg file, COCO names, input video path, output video path and the SAFE DISTANCE to be maintained.

4. Now onto the main part. Open up the main.py file. First let’s make the necessary imports. We also define 2 more constants LABELS and COLORS which we will be using later.

5. Next, we load in the YOLO model using the configuration and weights we downloaded before. The readNetFromDarknet function helps us to do so.

layer_names consists of all the output layer names we need from YOLO.

6. Now, we use OpenCV’s VideoCapture function to read the input video stream.

We also set the dimensions of the video frame (W, H) as (None, None) initially. After this, we use the CAP_PROP_FRAME_COUNT of OpenCV to count the number of frames in the given input video stream. We also embed this in a try/except in order to catch any exceptions.

7. We then read each frame of the input video stream.

OpenCV’s read function helps us do that easily. What is a frame you ask? It is simple! As the name suggests, a frame is basically one shot of the video. All these frames stitched together makes up a video. The frame is an array consists of 3 arrays. Each array represents a color i.e Blue, Green, Red(BGR). Each array consists of numbers between 0 to 255, which are called as pixel values. Each image is made up of pixels. So for a 4 * 4 image, there are 16 pixels.

We use a while loop to loop over all the frames of the input video. If in any case a frame is not grabbed we break the while loop as it may be the end of the video. We also update our H and W variables from (None, None) to the (height_of_frame, width_of_frame). Next, we create a blob of the image frame. As OpenCV uses ‘traditional’ representation of colors, they are in the form of BGR(Blue, Greeen, Red). So, we pass the argument swapRB = True to swap the R&B color arrays. Thus, we now get an RGB color array. We also rescale the image by dividing the array elements by 255, so that each element lies between 0 to 1 which helps the model to perform better.

A BLOB stands for Binary Large OBject and refers to a group of connected pixels in a binary image. We then give that as input to the model and then we perform a forward pass of YOLO.

8. The output from YOLO consists of a set of values. These values help us define which class the object is of and it also gives us the detected object’s bounding box values.

We loop over every output in the layer_outputs and every detection in the output. We get the scores of each class(80 classes from the COCO names) from the detection array. Also we get the confidence of each class. We then keep a threshold confidence as 0.5 and as we are only interested in detecting people, we set the classID as 0. From each detection we get a bounding box. The first 4 elements of the detection array gives us [X_center_of_box, Y_center_of_box, Width_of_box, Height_of_box], which we then scale to our image frame dimensions.

9. Then we start drawing the bounding boxes

We use Non-Max Suppression in order to avoid to avoid weak and overlapping bounding boxes. Then we calculate the distance between the centroid of current box with all the other detected bounding box centroids. We use the Euclidean distance to measure the distances between the boxes. Below is the formula for Euclidean distance.

We compare the each distance with the SAFE_DISTANCE constant we defined earlier in the constants.py file. Next, we use the use the rectangle function of OpenCV to create a rectangle with the box dimensions we received from the model. We check if the box is safe or unsafe. If unsafe then the box color will be colored red else the box will be colored green. We also display a text showing the number of people unsafe using OpenCV’s text function.

10. Now we create a video by joining each frame back

The VideoWriter function of OpenCV helps us to do that. It will store the output video at the location specified by OUTPUT_PATH which we have defined in the constants.py file earlier. The release function will then release the file pointers.

Output

Phew!… Now that the coding part is over, time to see the fruits of our effort.

Go ahead and run the main.py file as follows.

python main.py

Once the program is executed completely check your output folder and open the output.avi file.

It should look something like this…

output.avi

Impressive right!

Limitations and Future Scope

Although this project is cool, it has a few limitations,

  • This project does not take into account the camera perspective.
  • It does not leverage a proper camera calibration (Distances are not measure accurate).

We can work on these limitations in the future.

And that’s it. With just OpenCV, python and a few lines of code, we can create something that cool!

Code

The entire code for this article is present on this GitHub link. Please leave a ⭐️ on my GitHub repo if you liked the project.

End Notes

Please give this article a clap if you feel it was useful. Thank You!

--

--