Person / Pedestrian Detection in Real-Time and Recorded Videos in Python — Windows and macOS

Published in

Analytics Vidhya

4 min readJan 2, 2020

Video is sourced from first 10 seconds of Bollywood song Birju

Person detection is one of the widely used features by companies and organizations these days. This technology uses computer vision to detect persons, usually pedestrians while they cross the street or to identify any movement around premises.

In this project, we will create a person detection system in python in easy steps.

We begin by installing the OpenCV (Open Source Computer Vision) library which is built to help developers carry out tasks related to computer vision. We need to do a pip install for the OpenCV library.

install opencv-python

Let us now build the system in Python

We have the required libraries installed. The way the technology works is that we train the model on various image parameters of the object to be detected (person in this case), which is used to identify the object in our target.

Think of it as the train and test datasets of any machine learning model.

In this case:

Train dataset: .xml files which capture the image details of the target object

Test dataset: Live stream video/ Recorded video

The link to the full code can be found at the end of this article. I will explain the code in steps and blocks to help you understand how it works:

Step 1: Open Spyder

Step 2: Import the library

import cv2

Step 3: Reference the input to your webcam or to the video file saved on your hard drive (mp4 format)

Webcam: cap = cv2.VideoCapture(0)
Video: cap = cv2.VideoCapture(<enter file path.mp4>)

Step 4: We will use a pre-trained .xml file which has data on people (full body) built using individual images. You can download the file here

pedestrian_cascade = cv2.CascadeClassifier(<enter file path>/haarcascade_fullbody.xml’)

Step 5: The video is divided into frames and the code reads one frame at a time. In each frame, we detect the location of the person in the frame using the APIs which we have imported above. For each person detected, we locate the coordinates and draw a rectangle around it and release the video to the viewer.

The full code is shown below — explanations follow below the code

while True:
    # reads frames from a video
    ret, frames = cap.read()
    # convert to gray scale of each frames
    #gray = cv2.cvtColor(frames, cv2.COLOR_BGR2GRAY)
    # Detects pedestrians of different sizes in the input image
    pedestrians = pedestrian_cascade.detectMultiScale( frames, 1.1, 
    1)
    # To draw a rectangle in each pedestrians
    for (x,y,w,h) in pedestrians:
        cv2.rectangle(frames,(x,y),(x+w,y+h),(0,255,0),2)
        font = cv2.FONT_HERSHEY_DUPLEX
        cv2.putText(frames, 'Person', (x + 6, y - 6), font, 0.5, (0, 
        255, 0), 1)
        # Display frames in a window
        cv2.imshow('Pedestrian detection', frames)
    # Wait for Enter key to stop
    if cv2.waitKey(33) == 13:
        break

Block 1:

# reads frames from a video
 ret, frames = cap.read()
 # convert to gray scale of each frames
 gray = cv2.cvtColor(frames, cv2.COLOR_BGR2GRAY)

The video is read in individual frames. Next, the frame is converted to grayscale which helps in detecting humans quickly. The reason why the image is converted to grayscale is that the trained dataset is built in grayscale to reduce the size of the file.

Block 2:

# Detects pedestrians of different sizes in the input image
    pedestrians = pedestrian_cascade.detectMultiScale( gray, 1.1, 1)# To draw a rectangle in each pedestrians
    for (x,y,w,h) in pedestrians:
        cv2.rectangle(frames,(x,y),(x+w,y+h),(0,255,0),2)
        font = cv2.FONT_HERSHEY_DUPLEX
        cv2.putText(frames, 'Person', (x + 6, y - 6), font, 0.5, (0, 
        255, 0), 1)

The first section of the code detects the person(s) in the frame and stores their coordinates (x, y axes, and the width and height of the person). The second section draws a rectangle around the area where the person is detected and displays the text ‘Person’ above the rectangle. You can change the font of the text and the code (0, 255, 0) is the color code of the rectangle and the text in B-G-R sequence.

Block 3:

# Display frames in a window
        cv2.imshow('Pedestrian detection', frames)
# Wait for Enter key to stop
    if cv2.waitKey(33) == 13:
        break

The resulting image (frame) is released to the viewer and the loop continues to run until the user hits the Enter key on the keyboard.

Step 6: All captured videos must be released.

cap.release()
cv2.destroyAllWindows()

Run the program in the command line

The next step is to save the file in .py format and run it in command line/Anaconda prompt.

I ran it in Anaconda prompt by first navigating to the folder using the command cd.

cd <folder path>

Run the python file

python filename.py

You will see a pop-up window with the video playing. The video might be slow and it is because the number of frames is usually large in OpenCV. However, if you save the video on your hard drive, the written video is not slow and matches the fps (frames per second) of the input video.

Displaying confidence of detection around the boxes often helps to reduce misclassifications

You can download more royalty-free videos here.

Ta-Da! You can now easily detect burglars and be safe. Integrate this with an alarm system to stay secure.

Play around with bipeds and robots which mimic humans in their walking style and let me know how it goes in the comments section below.

Facing issues? Post your query.

Use Cases

Instructors can use this feature to take attendance
Estimate the number of people in a crowd — avoid riots etc.

Codes

chandravenky/Computer-Vision---Object-Detection-in-Python

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Where to find me 🤓

Connect with me on LinkedIn/ GitHub / My website
Feeling generous? Buy me a coffee here ☕️