A tour of Video Object Tracking — Part III: Multiple Object Tracking

5 min readSep 22, 2019

This article is the third of a series about Video Object Tracking, that I have been writing during my internship at Wintics, with the great help of Levi Viana (CTO at Wintics) and Emeline Fay.

- Part I: Presentation of Video Object Tracking
- Part II: Single Object Tracking
- Part III: Multiple Object Tracking

In Multiple Object Tracking, as its name indicates, there are multiple objects to track. The tracking algorithm is expected first to determine the number of objects in each frame, and second, to keep track of each object’s identity from one frame to the next.

MOT is a challenging problem: ID switches are hard to avoid especially in crowded videos, and the nature as well as the number of objects in each frame is unknown, so MOT algorithms strongly rely on detection algorithms, which are themselves not perfect.

1. Datasets and benchmarks

Evaluating MOT itself causes some problems. Firstly, datasets are difficult to establish. Secondly, MOT algorithms depend on the detector, and the community does not always agree on the protocol to follow: should we evaluate the tracker with the same detector for all, or should we evaluate the whole system (detector+tracker) together? Lastly, although some performance measures such as MOTA are commonly used, some researchers still do not adhere to this metric.

a. Benchmarks

Before 2015, there was rather limited work to standardize MOT evaluation, and most of the published datasets, even today, are application-specific.
Since 2015, the most used benchmark for MOT is the MOT challenge, which focuses on pedestrian tracking. Their website contains a leaderboard and is always open for new tracker submissions.

In parallel, the UA-DETRAC Challenges were launched in 2015. Almost every year the DETRAC challenge analyzes state-of-the-art algorithms for detection and tracking of vehicles. Participants can submit both detectors and trackers, and each tracker is evaluated with all detectors.

Today, the MOT challenge is the main reference for benchmarking MOT algorithms. In this challenge, they provide ready-to-use detections, but also allow participants to submit their own detector (in which case it must be specified). The whole system (detector+tracker) is then evaluated.

b. Performance measure

There are a wide range of metrics which have been proposed in the literature. In this section, I present the main ones.

Mostly Tracked (MT), Partially Tracked (PT), and Mostly Lost (ML)

Each trajectory can be classified as mostly tracked (MT), partially tracked (PT), and mostly lost (ML).
A target is mostly tracked if it is successfully tracked for at least 80% of its life span, mostly lost if it is successfully tracked for at most 20%. All other targets are partially tracked.

MOTA (Multiple Object Tracking Accuracy)

The MOTA is the most used summary metric for MOT. It is defined as follows:

where FN_t is the number of false negatives (missed targets), FP_t the number of false positives (ghost trajectories), IDS_t the number of identity switches at time t. A target is considered missed if the IoU with the ground truth is inferior to a given threshold. (Note that the MOTA can be negative.)

The community usually reports MOTA and Mostly Tracked to evaluate performance.
In “Tracking the trackers”, the authors conducted an experiment where people evaluated which tracker was the best. It turns out that people usually agree with these metrics, meaning that their evaluations were positively correlated with the MOTA.

Note that in the MOTA metric, the predominant factor of bad performance is usually due to False Negative: a tracker will usually help the detector to delete False Positive, but when the detector fails, current multiple object trackers are not able to recover from this failure.

c. State-of-the-art algorithms

For MOT, it is difficult to exhibit an explicit class of algorithms which outperform the others such as Siam trackers and CF trackers for SOT (see previous article of the series).
Current state-of-the-art algorithms, i.e. algorithms which perform the best on the MOT challenge are all quite different. It is interesting to see that several researchers employ simple methods and their algorithms do not seem to be outperformed by the more complex ones (e.g. reinforcement learning, RNN,…)

For example, the authors of one of the top performing MOT algorithms in 2019, TracktorCV, by exploiting the bounding box regressor of a Faster-RCNN detector, and without any training on tracking data, achieve comparable results to that of state-of-the-art algorithms. Similarly, simple IoU trackers using efficient detectors also outperform much more complex algorithms. This underlines the fact that there is plenty of room for improvement in tracking algorithms.

These simple methods, when complemented with some other techniques, achieve state-of-the-art results in 2019: extending their Tracktor algorithm with a Siamese re-identification network and a motion model, Bergmann et al. manage to get state-of-the-art performances. Extending the IoU tracker with a single object tracker to bridge the gap between missed detections, Bochinski et al. also achieve top performances.

It is also interesting to notice the trend which consists in incorporating SOT algorithms and which seems quite promising. For example, LSST, which achieves top performance on MOT 2017, includes a SiamRPN sub-net to capture short term cues, a re-identification (ReID) GoogleNet Inception to extract long term cues. It is complemented with a switcher-aware classifier whose role is to combine the multiple cues appropriately.

Conclusion

MOT is a difficult problem, and the field is still growing.
Most of the algorithms are benchmarked on the MOT challenge. However the existing datasets are still application-dependent (human tracking for the MOT challenge), and there still remains some debate on some points of the evaluation methods.

Additionally, it is hard to pick a class of methods which really stands out, and existing algorithms (even top performers) still have room for improvement. Many complex algorithms have been tried, and have been outperformed by simpler methods such as exploiting the detector (Tracktor, IOU), and extending with other techniques… Inspired by the progress of Single Object Tracking, some researchers have also successfully tried to incorporate Single Object Trackers into MOT.