Object Tracking with YOLOv5 and SORT

Jarrod Mccarthy
3 min readJul 27, 2021

--

Maintaining a unique identifier across the temporal dimension…

Some contents:

  1. Object Tracking vs Object detection?
  2. YoloV5 in PyTorch
  3. SORT
  4. Track stuff in front of your webcam…

Often used interchangeably, object tracking and object detection are not quite the same. Object detection is the task of being given an image and then determining object(s) in that image. Object tracking is an extension of object detection in the sense that not only must you detect the objects in the first case, you must also perform some analysis to determine whether an object you detect in subsequent frames is the same as an object in the previous frame.

Object detection answers the question; “Is there a person in this picture?”

Object tracking answers the question; “Is there a person in this picture? and is that person the same as the person in the previous picture?”

In this article I’m discussing an approach to Object tracking, specifically Multi-object tracking (MOT).

the gods at ultralytics have written a PyTorch implementation of YoloV5 and trained it on the COCO dataset! The line above loads the model straight from the github repo and (likely) into your cache…

YoloV5 is a Single-Stage object detector, you can read some details about it here:

Yolo is complicated to read, the Neural net is defined by the (in my case) yolov5s.yaml file buried in the models file that will likely be sitting at your

C:\Users\username\.cache\torch\hub\ultralytics_yolov5_master\models …

path. In any case, it’s been trained on the COCO dataset with 80 Classes

The number of classes (nc) defined in the yolov5s.yaml file is 80

There is a list of 91 classes for COCO here:

Our model is trained to recognise some subset of those classes.

SORT is an acronym for Simple Online and Real-time Tracking. A quick little summary can be taken directly from the article where it was first presented:

A very brief semi-summary of how SORT works…

The SORT algorithm does not utilise a Neural net of any kind just some good
old Kalman filtering (https://en.wikipedia.org/wiki/Kalman_filter), Intersection Over union (IOU: https://en.wikipedia.org/wiki/Jaccard_index) for Cost matrix assignment and a bit of
Hungarian algorithm (https://www.youtube.com/watch?v=FQp9HJSg1zs) in order to solve the cost matrix optimally. Here is the paper where it was presented:

Here is an implementation of SORT in python: https://github.com/abewley/sort

Abewley even tells us how to use the beautiful creation

Thank you abewley

Combining some stuff and running object detection on your webcam

a little script to get a video feed and track objects across the video feed’s frames…
Count those People!

Catch ya!

--

--