High-Performance SORT Tracker In Rust

7 min readJul 28, 2022

SORT (Simple, Online, and Realtime Tracking) is a well-known, frequently used MOT (multi-object tracker) algorithm that fits well when the objects are not occluded by obstacles or overlapped by each other. There are many such cases, e.g., a factory line with parts moving on the conveyor. The advantage of SORT is low resource consumption — it can be deployed on powerful core or low-resource edge devices.

In the previous article, I demonstrated an implementation for the simple IoU tracker that builds object tracks based on previous and current observations overlapping (IoU metric). The SORT algorithm also uses the IoU metric to associate objects, but instead of historical data, it uses extrapolated future data representing probable object trajectory generated by the Kalman filter. Let’s briefly observe how it works.

Kalman Filter Overview

The Kalman filter implementation is an out-of-scope topic, but you need to understand how it works anyway, so I try to illustrate it. Imagine that you have a noisy observation of the object bounding box. You pass that observation to the Kalman filter; the filter applies the object's learned motion characteristics and returns the object's next predicted position.

The following image demonstrates it for a single-dimensional variable (blue dots) — the red line represents the Kalman filter historical results, and the blue line represents predicted variable values:

Kalman filter results example from https://simdkalman.readthedocs.io/en/latest/

Initially, the filter knows nothing about the object’s motion characteristics, but the more it is called for the object, the better it predicts the position of the object. Aside from the prediction, the Kalman filter smoothens the object trajectory. In the case of bounding boxes, it leads to more stable boxes with less wobbling. in the SORT algorithm Kalman filter simultaneously predicts 4 variables: [x, y, aspect, height] — every variable evolves according to its motion law learned by the filter through observations.

How the IoU method discussed in the previous article differs from SORT? The following picture explains that:

You may see that IoU with the Kalman filter leads to larger I and smaller U because the predicted box is compared against a newly detected box, so the overlap is greater.

If you want to investigate the Kalman filter implementation for bounding box prediction in Similari, look at the repository file.

Track Matching in The SORT

Another distinctive feature of the SORT is that it uses a special algorithm to assign current detections to tracks. In our simple IoU tracker, we have used the candidate with the best IoU characteristic that intuitively must give the best result. Meanwhile, the SORT uses another technique called the Hungarian algorithm that maximizes summarized IoU over all tracked objects.

It doesn’t automatically mean that the Hungarian method gives better results in all possible scenarios, but the SORT participates in MOT benchmarks where the Hungarian method shows better results. I’m not planning to explain how it works because it is not critical to dive into it to understand the SORT — who is interested, take a look at the algorithm description or the implementation in Similari.

Rust IoU SORT Tracker

Upd (Aug 09, 2022): the code in the article was updated to meet Similari 0.20 API. Initially, it was presented for Similari 0.18.

Upd (Aug 14, 2022): the code in the article was updated to meet Similari 0.22 API.

Recently the Similari framework received a full-featured SORT implementation and flexible SORT middleware implementation. I will demonstrate how you can use the SORT in your applications.

Let’s start with the performance numbers of the Similari SORT tracker:

10 objects   :     101,962 ns/iter (+/- 7,258)       [ 9803 FPS]
100 objects  :   1,716,538 ns/iter (+/- 309,999)     [  580 FPS]
500 objects  :  17,305,018 ns/iter (+/- 2,704,988)   [   57 FPS]
1000 objects :  52,803,677 ns/iter (+/- 3,804,433)   [   18 FPS]

The benchmark is located at benches/simple_sort_iou_tracker.rs. Those numbers are received with an 5-year-old Intel(R) Core(TM) i5–7440HQ CPU @ 2.80GHz. The state-of-art CPU will give a significant boost versus those numbers.

You can find that more than 500 simultaneously observed objects can be tracked in real-time. I also benchmarked the conventional Python’s SORT implementation from the original repository:

The results are below:

Run for 10 took 2.157175960019231 ms       [465 FPS]
Run for 100 took 12.26064283051528 ms      [ 83 FPS]
Run for 200 took 26.585440589115024 ms     [ 38 FPS]
Run for 300 took 40.949504560558125 ms     [ 24 FPS]
Run for 500 took 72.7029528899584 ms       [ 14 FPS]
Run for 1000 took 159.64975245995447 ms    [  6 FPS]

As you can see, on the same hardware Rust version can track 500 and more objects simultaneously in Real-time. For the Python version, 300 objects is a limit. Also, the Rust tracker requires less CPU to work on fewer objects, making it suitable for low-resource edge devices.

Why FPS Metric is Important?

When you implement the real-time tracking pipeline, you have to run various operations like receiving the frames from the source, decoding them, detecting the objects, tracking them, and sending the results. All those operations take time and must fit a 1/30 of the second (33 ms). Imagine that the scene includes 100 objects, so the Python SORT tracker steals 12 ms, and you have 21 ms to complete the rest of the operations.

With Similari Sort, you have more than 30 ms to complete the rest of the operations — you can do more computations and still fit the desired FPS.

The Usage

First, let’s create a new Rust project:

cargo new similari-medium-sort
cd similari-medium-sort

Add Similari dependency to your Cargo.toml :

[package]
name = "similari-medium-sort"
version = "0.1.0"
edition = "2021"

[dependencies]
similari = "0.22.0"

Create the tracker in main.rs :

The code should be clear but let’s still observe essential parts. First is how the tracker is constructed:

The arguments are:

shardsdefines how many background threads will be created; for small trackers with up to 100 simultaneous objects, it is wise to set as 1 .
bbox_history defines how many of the lastly observed bounding boxes are kept in track history. If you use the tracker as an online one, keep it as low as 1 , if you consume tracks after they become terminal, keep it as high as necessary to record the track in your conditions;
max_idle_epochs defines when a stored track becomes terminal — when no updates to the track during max_idle_epochs , every update increases the epoch counter;
method — the tracker may estimate tracks by either IoU(threshold) or Mahalanobis distance. When the IoU is used, the value lower than the threshold makes the new track creation instead of merging the bounding box to a current track;
spatio_temporal_constraints — and optional configuration parameter that allows ignoring the tracks situated far from the objects to reduce the number of computations.

Now it is time to get tracks:

The resulting _tracks is a Vec<SortTrack> that contains the online tracking information:

id is a track id — it is new when the new track is created or an existing one when the box was assigned to a current track;
epoch — the current epoch of the track; it is updated upon every assignment of the bounding box to the track;
predicted_bbox — bounding box, that was predicted by the Kalman filter;
observed_bbox — bounding box that was received from the detector;
scene_id — object descriptor that a user defines (see later);
length — track length, measured in the number of merges; one can use it to avoid tracking objects while tracks are short and unstable;
voting_type — show how the track was extended on the current step; for the SORT tracker, the only type is Positional;
custom_object_id — an optional user-defined attribute passed to the predict alongside the bounding box; the track that was merged with the new observation receives custom_object_id as well to help the caller easily to understand what track was extended.

Instead predict you can also use predict_with_scene that allows passing scene_id . Scene-ID is a user-defined descriptor that one can use to separate objects by classes or camera identifiers within a single tracker.

At the end of the code, you may see how the tracker is used as an offline tracking engine:

In the example, two epochs were skipped intentionally to make tracks switch to a terminal state and receive them from the tracker store.

Running the program results in output for two tracks shown in the next listing:

If you want to see online tracker results, uncomment the line //eprintln!(“Tracks: {:#?}”, _tracks);

The full source code can be found on GitHub by the link.

Additional Notes

Confidence. For the bounding box, you can set theconfidence parameter like:

BoundingBox::new_with_confidence(
    left: f32,
    top: f32,
    width: f32,
    height: f32,
    confidence: f32,
)

The confidence must lay between [0; 1]. tracker respects the parameter and corrects the weight of the metric accordingly. This means that the boxes with lower confidence decrease the metric value.

Meric. Instead of the IoU metric, you can use the Mahalanobis metric. This is another way to calculate the distance between the objects. Mahalanobis distance can be beneficial in certain cases.

Oriented (rotated) bounding boxes. The tracker works well with bounding boxes that are rotated. Universal2DBox contains the method rotate that can be used to set the angle of the box if the model provides that:

Universal2DBox::new(xc: f32, yc: f32, angle: Option<f32>, aspect: f32, height: f32)

One may also call rotate method. The tracker provides great performance for the rotated boxes. Similar performance numbers are expected.

Conclusions

We dived into the SORT tracker algorithm and learned how to implement the high-performance SORT tracker in the rust project. The Similari Simple SORT implementation is a ready-to-use implementation. You can take a look at the source code of SimpleSORT and customize it as needed. When you need more advanced data processing scenarios, please look at SORT middleware implementation and an example of the usage.