High-Performance Object Tracking Engine with Rust

8 min readJul 15, 2022

Object tracking is a very frequent task in the machine learning industry. There are many efficient object tracker algorithms around like SORT, DeepSORT, FairMOT, and others. Most of them involve highly intensive algorithmic operations with brute force behavior. Due to the flavor of data science, most popular implementations are made in Python and NumPy.

Such an approach fits well when you build PoC but leads to performance bottlenecks when you build the solution for the production environment — for example, the DeepSORT algorithm becomes a bottleneck when more than 15 persons are tracked simultaneously at 20 FPS. So Python, even when used with highly efficient NumPy, still has interpretation language drawbacks and GIL that limits the parallelization capabilities aside from IO tasks.

In the article, I will present a new open-source Rust framework Similari that was developed to build various object tracking systems. I also demonstrate how to build a simple IoU tracker engine with Similari. First things first, let’s go with the IoU tracking engine definition.

You also may find useful other tracking articles:

High-Performance SORT Tracker in Rust — how to add ready-to-use SORT tracker to your Rust application;
Feature-based Object Tracker With Similari and Rust — how to implement pure visual tracker in your Rust application;

IoU Tracking

When you track objects (for the rest of the article, I count the object = bounding box around it), you have to solve the task of matching between the currently observed objects and previously built tracks.

So, in the beginning, you have no tracks and detect objects in the field of view. At that step, you just add objects to the tracking database and assign a track identifier to each track. Each object forms its own track.

On every next step, you detect objects in the field of view and somehow assign those objects to current or new tracks. To do that, there are some techniques. Most frequently, object ReID features and IoU metrics are used. While the ReID method is a more advanced and preferable technique, IoU is simple and used almost everywhere. Let’s briefly dive into the IoU before we move to the code implementation.

IoU is an abbreviation of Intersection over Union. It is a metric that calculates how much the two objects overlap. Luckily the result of IoU computation is always between [0.0 .. 1.0], where 0.0 means that objects have no overlap, and 1.0 is when two objects completely match. The IoU is calculated as:

IoU = I / (SO1 + SO2 - I)

where,

I — an intersection of two objects,
SO1 — the area of O1,
SO2 — the area of O2.

If you want to know more about IoU formulas, please read the article. What is important is that the IoU method gives us the metric for matching observed objects to tracks.

Tracking approach

For the purpose of simplicity, the proposed tracking algorithm doesn’t use predictions that are very beneficial from the perspective of tracking quality in real-life trackers.

For the tracking algorithm, we will use the following method: every track holds at most three lastly added object boxes. When a new object appears, the algorithm matches it versus every track in the database. The algorithm is explained with the following snippet.

It works pretty simply. You don't know how it the vote_topn implemented yet, but the intuition about the method must be clear.

The function vote_topn also pretty straightforward - it finds the single winner across the collected distances, and the object is added to the winning track. The object forms a new track if there is no winning track.

Cool! Now, you understand the IoU approach, so we will use Similari to build an IoU tracker with Rust. If you feel brave, you can go directly to the IoU example in the Similari repo and play with it. Meanwhile, we dive into Similari concepts and the details of why it is a good tool for trackers.

Similari Introduction

I created Similari to build high-performance trackers for computer vision products. Similari is built in Rust with the power of generic programming. The compiler generates highly efficient monomorphized code without virtual tables and excessive abstractions. Similari uses the Ultraviolet crate for blazing fast SIMD calculations.

To utilize the power multi-core architecture, Similari shards the track space and parallelizes computing with the pool of dedicated executors (one executor per shard). Initially, I used Rayon to make parallelized operations, but it turns out that a dedicated shared pool performs better in the case of Similari.

Similari is licensed under Apache 2.0 license.

Let’s quickly explain the moving parts behind Similari:

The Track is the accumulator object that collects the track of the object.
Tracks have Track Attributes — custom attributes that have meaning in the domain field, e.g., Camera ID, bounding boxes collected, etc.
Tracks have Observations — per-object observations that are included in an object track; The Observation includes two properties — Observation Attributes (custom attributes defined in the domain field, e.g., Bounding Box) and Observation Feature — a vector of Float numbers that represents feature object.
Track Store — a database where tracks are kept;
Track Merge Operation — the operation that merges the source track into the resulting track; during the merging, it modifies the Track Attributes and Observations of the resulting track.
Track Distance Operation — the operation that calculates the distances between track and other compatible tracks kept in the Track Store.

Now, let’s create a new cargo project and implement the IoU tracker.

cargo new iou --bin

Add Similari to Cargo.toml dependencies:

Now, let’s begin with developing the tracker itself. TheBBox object is defined within Similari, and the IoU metric is implemented for it. Begin with creating track attributes:

We defined BBoxAttributes — a struct that holds bboxes. It represents track attributes. We are going to accumulate observed object bounding boxes in the bboxes. To make it compatible with Similari, we must implement the trait TrackAttributes. The trait introduces three methods:

Compatible — the purpose of the method is to answer if the track is compatible with another track during the distance calculation. The method allows decreasing the search space by defining spatial, temporal, or other constraints. In our case, the single space across all the tracks, so they are all compatible.
Merge — the purpose of the method is to define how the attributes are merged when two tracks are merged. In our case, bboxes from a merged track are appended to the bboxes of the destination track.
Baked — the purpose of the method is to define when the track is ready and can be used for distance calculation. E.g., when there are more than X features collected. In our case, tracks are always ready.

Next, we express BBoxAttributesUpdate — a structure that is used to update track attributes upon an observation added into a track. In our case, there is nothing useful there:

The apply method is used to update current BBoxAttributes with the update, but in our case, it does nothing. We could implement it like bboxes.push(self.bbox), but we will use another mechanism because bounding boxes are also present in observation attributes.

Let’s next define the Metric object for our tracker. It calculates pairwise track distances and optimizes observations kept in a track.

There are two ObservationMetric trait methods have to be implemented:

Metric — the method calculates metrics for observations. In our case, we don’t calculate the distance for features (always None) because we don’t use the features. But we calculate IoU distance for bboxes : BBox::calculate_metric_object(&e1.0, &e2.0) that returns Some(f32) for IoU.
Optimize — the method is called after each merge or when an observation is injected into a track. In our case, when an observation is added (not merge operation), we extend track attributes with newly added bbox . Next, we cut observations to the last three at most to drop excessive old observations that are not participating in the IoU algorithm.

Now we have to implement a voting engine that generates winner tracks:

Our voting engine first drops candidates with low IoU values. The next step collects the remaining candidates into groups by the number of votes for the track. Again, drops the candidates with a low number of votes and returns Top N of them.

Now, we have created all important tracker parts and are ready to build the tracker itself.

First, let’s demonstrate all the code and next explore it step by step:

To demonstrate the tracker's work, we will simulate the movement of two boxes on the field.

let pos_drift = 1.0;    
let box_drift = 1.0;    let mut b1 = BoxGen2::new(100.0, 100.0, 10.0, 15.0, pos_drift, box_drift);     let mut b2 = BoxGen2::new(10.0, 10.0, 12.0, 18.0, pos_drift, box_drift);

Iterators b1 and b2 generate boxes [x, y, width, height] that randomly drift according to pos_drift (x , y), and box_drift (width , height).

For generated boxes, we initialize external tracks that are outside the track store. Each track includes a bounding box generated and no feature attribute (because we don’t use features at all):

Finally, we try to match the new tracks to tracks kept in store or initialize new tracks if there is no match:

Ok, that’s it. If you run the algorithm, you will get the result representing the tracks for two objects:

Track id: 1657891130776
Boxes: [
    BBox {
        x: 100.18772,
        y: 99.46758,
        width: 10.583958,
        height: 14.880779,
    },
    BBox {
        x: 100.23967,
        y: 98.68012,
        width: 11.117259,
        height: 15.76297,
    },
    ...
]Track id: 1657891130777
Boxes: [
    BBox {
        x: 10.51288,
        y: 9.5912895,
        width: 12.237838,
        height: 17.537289,
    },
    BBox {
        x: 11.132711,
        y: 9.841759,
        width: 12.130169,
        height: 17.78382,
    },
    ...
]Process finished with exit code 0

Performance

The Similari repository contains the benchmark for the IoU tracker implemented. The benchmark shows the following numbers when run on 4-core Intel(R) Core(TM) i5–7440HQ CPU @ 2.80GHz:

The benchmark generates X objects and tries to match them to previously stored X objects over the IoU metric. The matched objects are merged into tracks, and the operation continues while the benchmark runs. As you can see, the tracker can track over 500 simultaneously observed objects at 50+ FPS. The benchmark doesn’t use object distance heuristics to decrease the object space.

Conclusion

We have built a very simple tracker that uses IoU similarity to match objects. The tracker can be significantly improved by adding the Kalman filter to optimize method.

The full project code is accessible on GitHub. You can find other tracker samples in the Similari repository examples:

Simple feature-based tracker (Euclidean distance);
Partial track merging engine based on feature similarity (Euclidean distance);
Simple feature search engine example (but better use HNSW for that).

If you are interested in further knowledge, you can read my article about building the feature-based Re-ID tracker at the link.

Also, there are performance benchmarks in the Similari repo that you can run with a Rust nightly compiler.

If you like Similari, please rate the repo :). As an author, I would highly appreciate the feedback and questions about the Similari.