Counting People On Escalator Using Yolov8 and OpenCV from scratch

5 min readSep 16, 2023

Object detection is one of the important phenomena in the field of computer vision. On the other hand, computer vision is progressing with the release of YOLOv8, A model that defines a new state of the art for object detection and sample segmentation.

One of the first examples to get familiar with the YOLO model is using it for the purpose of counting cars on the street, people and other objects. Here, in particular, we will explore how to implement counting people going up and down the escalator step by step from the beginning. if you are interested, I coded everything in this repo.

▴Step1

🌎 Set up the environment

To begin, we should set up our environment. We need a Python environment with OpenCV, a popular computer vision library, and YOLO installed. All the codes are implemented in the PyCharm environment.

Install all the necessary dependencies, such as below.

import numpy as np
from ultralytics import YOLO
import cv2
import math
from sort import *
from helper import create_video_writer

Note: ‘sort.py’ file related to tracker and ‘helper.py’ file for saving output video.

▴Step2

🗹 Download a sample video. I suggest you to download from shutterstock or pexels. Of course You can find two sample videos in my repository.

▴Step3

We have to specify the parts of the video that we want object detection to be done, and the rest of the video parts should not be processed. Therefore, we need to create a mask. You can create a mask for your desired video using Photoshop or Canva Site. Here you can see the mask sample for my video, the white part represents where people should be counted in the escalator.

▴Step4

The first line is to read the input video and the next line is to save the output. Then YOLO model is loaded and specifies the location of the YOLO weights file.

cap = cv2.VideoCapture("videos/sample.mp4")  # For Video
writer = create_video_writer(cap, "Output.mp4")
model = YOLO("yolov8n.pt")

▴Step5

We have a classNames variable containing a list of object classes that the YOLO model is trained to recognize. Although we only need the “person” class here.

classNames = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck", "boat",
              "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
              "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella",
              "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball", "kite", "baseball bat",
              "baseball glove", "skateboard", "surfboard", "tennis racket", "bottle", "wine glass", "cup",
              "fork", "knife", "spoon", "bowl", "banana", "apple", "sandwich", "orange", "broccoli",
              "carrot", "hot dog", "pizza", "donut", "cake", "chair", "sofa", "pottedplant", "bed",
              "diningtable", "toilet", "tvmonitor", "laptop", "mouse", "remote", "keyboard", "cell phone",
              "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors",
              "teddy bear", "hair drier", "toothbrush"
              ]

We read the mask image and activate the tracking.

mask = cv2.imread("Images/mask.png")
# Tracking
tracker = Sort(max_age=20)

▴Step6

Here is the important part. My idea for counting people is to consider two lines. One line to count people going up the escalator and another line to count those going down the escalator. Each person who crosses these lines adds one to the counter associated with the line.
We define the coordinates for drawing each line as follows.

limitsDown = [150, 220, 250, 220]
limitsUp= [70, 170, 160, 170]
totalCountUp = []
totalCountDown = []

▴Step7

The while loop starts and it reads each frame from the video using cap.read(). Then it passes the frame to the YOLO model for object detection.

while True:
    success, img = cap.read()
    imgRegion = cv2.bitwise_and(img, mask)
    results = model(imgRegion, stream=True)

    detections = np.empty((0, 5))

▴Step8

For each result, it is checked that if the class is equal to “person”, it detects the object and draws the bounding box using the rectangle function.

  for r in results:
        boxes = r.boxes
        for box in boxes:
            print(box)
            # Bounding Box
            x1, y1, x2, y2 = box.xyxy[0]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
 # Confidence
            conf = math.ceil((box.conf[0] * 100)) / 100

            # Class Name
            cls = int(box.cls[0])
            currentClass = classNames[cls]

            if currentClass == "person" and conf > 0.3:
                cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 255), 6)
                currentArray = np.array([x1, y1, x2, y2, conf])
                detections = np.vstack((detections, currentArray))
 resultsTracker = tracker.update(detections)

These two lines of code are for drawing the two desired lines that I explained in step 6.

cv2.line(img, (limitsDown[0], limitsDown[1]), (limitsDown[2], limitsDown[3]), (0, 0, 255), 5)
cv2.line(img, (limitsUp[0], limitsUp[1]), (limitsUp[2], limitsUp[3]), (0, 0, 255), 5)

▴Step9

Here, we considered the coordinates of the middle of the drawn every rectangle as a small circle, in order to know if a person crossed the desired line on the escalator or not to add to the counter.

for result in resultsTracker:
        x1, y1, x2, y2, id = result
        x1, y1, x2, y2 , id= int(x1), int(y1), int(x2), int(y2) ,int(id)
        print(result)
        w, h = x2 - x1, y2 - y1

        cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 255), 3)

        # Using cv2.putText() method
        image = cv2.putText(img, currentClass, (x1, y1), cv2.FONT_HERSHEY_SIMPLEX,
                            1, (255, 255, 0), 1, cv2.LINE_AA)

        cx, cy = x1 + w // 2, y1 + h // 2
        cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)
        if limitsDown[0] < cx < limitsDown[2] and limitsDown[1] - 15 < cy < limitsDown[1] + 15:
            if totalCountDown.count(id) == 0:
                totalCountDown.append(id)
                cv2.line(img, (limitsDown[0], limitsDown[1]), (limitsDown[2], limitsDown[3]), (0, 255, 0), 5)
        if limitsUp[0] < cx < limitsUp[2] and limitsUp[1] - 15 < cy < limitsUp[1] + 15:
            if totalCountUp.count(id) == 0:
                totalCountUp.append(id)
                cv2.line(img, (limitsUp[0], limitsUp[1]), (limitsUp[2], limitsUp[3]), (0, 255, 0), 5)

 cv2.putText(img,f'Up: {len(totalCountUp)}',(480,40),cv2.FONT_HERSHEY_PLAIN,2,(139,195,75),3)
 cv2.putText(img, f'Down: {len(totalCountDown)}', (480,70), cv2.FONT_HERSHEY_PLAIN, 2, (50, 50, 230), 3)
 cv2.imshow("Video", img)
 writer.write(img)
 if cv2.waitKey(1) == ord("q"):
      break
cap.release()
writer.release()
cv2.destroyAllWindows()

Result

The project is completed And you can see the result in the video below.

Conclusion

In this article, we have discussed the steps to implement Counting people on escalator using Python,YOLO and OpenCV. By following these steps, you can easily build your own project and customize it to suit your specific needs.

Thank you for reading .😊