Feature-based Object Tracker With Similari and Rust

Ivan Kud
8 min readJul 19, 2022

--

Feature-based trackers are very often met in the deep-learning industry. Well-known feature-learning DNN models like OSNet generate homogenous re-identification (Re-ID) vectors. Those vectors can be estimated with distance metrics like Euclidean, Cosine, or another. The feature vectors inferred for a single object generate distance results that lay close. Unlike the IoU tracker demonstrated in my previous article, the Re-ID feature-based trackers operate over homogenous vectors that represent the object. The vector element has no meaning from the decision-making perspective, so there is no way to elaborate an algorithm that analyzes parts of vectors as self-meaning elements.

To demonstrate how it works in simplified 2D space, take a look at the following picture.

ReID feature vectors in 2D-space

In the picture, you can see 3 objects and their Re-ID feature vectors. The Re-ID model is efficient when the resulting vectors for the same object lay close to each other.

There are often outliers — you can see them outside the elliptic zones. Outliers are inevitable and decrease the Re-ID quality. Various techniques are applied to decrease the influence of outliers. One of them is Re-ID quality estimation — the additional model that estimates the quality of Re-ID. E.g., it can give a higher score when the Re-ID vector is inferred for the large, well-observable object and a lower score when the object is observed partially and in uncertain conditions. Another approach to decrease the influence of outliers is to compare distances to the last N of historical observations (not the last one) — it helps to implement weighted voting that decreases the influence of an outlier.

There are a lot of tricks about the Re-ID stuff, but the topic is too broad to discuss here. I would like the readers to understand the intuition about the Re-ID feature vectors. The example feature above demonstrates the Re-ID in 2D space. The position of the point shows how similar a certain point is to the instance X (1,0) , and the instance Y (0,1). The same rule applies to real-world Re-ID feature vectors with large lengths — 256, 512, or even greater dimensions. Every value on every position of a vector shows how an observable object looks similar to the object with the Re-ID feature vector of all-zeroes except 1.0 on that position (probably the object used during the DNN initial training).

The pairwise distances of vectors can be calculated with well-known methods like Euclidean, Cosine, Mahalanobis, and others. Which distance to use depends greatly on the Re-ID behavior and requires experimenting. Euclidean distance is used frequently, so we will use it to demonstrate the tracker.

Tracker Design

I will demonstrate the tracker example that builds tracks based on Re-ID similarity. On every step, the tracker receives Re-ID feature vectors for observed objects. Every new feature vector is compared against the feature vectors of tracks kept in the track store using Euclide distance. The track store keeps each track’s 5 last feature vectors to decrease outlier influence. So, for each new feature vector and a track in store, we receive 5 distances that show how close the new one is to those for the track. We receive up to 5 x N distances for all tracks kept in store.

Distance calculation

We somehow have to find a merge candidate when the distances are collected. To do that, we first filter out results with far distances, count them by track, and sort them by decreasing the number of votes. The first result is a track candidate where we merge the new feature vector. If the result is empty, the feature vector forms a new track.

Merge candidate selection

The above algorithm is a crucial part of real-world tracking algorithms that combine feature-based approach, IoU approach, and other heuristics to reach reliable tracking. In the example, we use a track with a single new feature vector to estimate the merge candidate. Instead, it’s possible to use a collected track to merge short tracks into a longer one when the object track is split into parts due to switches or when multi-cam tracking is implemented.

You may wonder where such a tracker can be used? If you can use additional heuristics like IoU and other object attributes, you should use them as it leads to better tracking. However, there are cases when the tracking task must be solved solely with feature-based tracking.

One of the obvious cases is when you cannot observe the object often, so IoU cannot be used. For example, you are using Intel Movidius Myriad 2 to collect biometrical information — it can handle 4–5 FPS in real-life scenarios. Such FPS leads to poor IoU quality, while you still can get perfect Re-ID results.

Another case is when you don’t need to track the object every moment but would like to find it from time to time. Such a tracker suits well to solve the task.

The last case is a multi-camera track merging. You track the object in the observable areas, collecting Re-ID features, and next merge tracks into a multi-cam session.

Implementation Considerations

You probably have noticed that the tracker uses a lot of computations. To implement it efficiently, the tracker must utilize multi-core architecture and handle distance calculation efficiently. Those features are built-in in the Similari — a framework designed primarily to build high-performance, low latency trackers. Similari is developed in Rust. Similari uses SIMD CPU instructions when operating with feature vectors to handle calculations fast.

The Similari repository contains the benchmark for the Re-ID tracker we will implement. Let’s take a look at the numbers.

Feature (256 @ f32) tracking benchmark for N simultaneously observed objects run on 4 cores of Intel(R) Core(TM) i5–7440HQ CPU @ 2.80GHz. The benchmark doesn’t use heuristics that separate the observed objects based on object distances.

The benchmark is located at benches/feature_tracker.rs.

10 objects:       101,465 ns/iter (+/- 10,056)         [9900 FPS]
100 objects: 4,020,673 ns/iter (+/- 877,444) [ 250 FPS]
500 objects: 61,716,729 ns/iter (+/- 11,215,929) [ 16 FPS]
1000 objects: 235,187,877 ns/iter (+/- 89,734,978) [ 4 FPS]

You can see that up to 500 objects can be tracked in real-time on a state-of-art CPU.

Tracker Implementation

Let’s implement the tracker in a step-by-step fashion. Let’s begin with the Similari core concepts we have to know before implementing the tracker.

Similari defines the following core objects:

  • Track — an entity that represents object track. The Track contains Track Attributes and Observations.
  • Track Attributes — a custom track part that holds domain-specific track information. In our case, we will collect bounding boxes in Track Attributes.
  • Observation — an object that represents object features observed that are placed in a track. An observation contains an optional custom Observation Attributes object and an optional homogenous Feature Vector. In our case, the Observation Attributes object will be represented by a Re-ID quality (F32), and the Feature Vector will be 2 @ F32.
  • Feature Class — an observation is associated with the Feature Class. A Track can collect various observations under various Feature Classes. In our case, we will use a single feature class — FEAT0.
  • Track Metric — an object that serves 3 roles: calculates metrics for Observations, optimizes Observations upon track construction, and filters out results after metrics calculations.
  • Track Store — an efficient parallelized database that handles lookups, distance calculations, and merging for tracks.
  • Voting Engine — an algorithm that selects merging candidates for calculated metrics.

Let’s begin with the coding. First, let’s define Track Attributes.

To update track attributes, we have to define an update object:

While the objects are created, they have no defined behavior and cannot be used in Similari. To fix that, let’s implement the required traits for them. First, the TrackAttributesUpdate trait for BBoxAttributesUpdate . The trait defines an apply method that is used to update track attributes. It is implemented as follows:

Next, let’s define the required trait for BBoxAttributes to make it work with Similari:

There are 3 methods here:

  • compatible is used when the tracks are selected for the distance calculation. Incompatible tracks are skipped;
  • merge is used to merge attributes when tracks are merged — in our case, we merge their bounding boxes;
  • baked is used to say storage that a track kept in storage can be selected for distance calculation if a caller wants only baked tracks; to add a track can be wasted if somehow it became unusable, e.g., it wasn't able to collect enough observations during the last iterations.

The next step is to define the Track Metric object:

It is a little bit sophisticated, but let’s go method by method to understand what happens.

The first method is metric . It calculates metrics (distances) between observations of two compared tracks. It can return

  • None if the observations are not comparable or
  • Some((Option<ObservationAttributes::MetricObject>, Option<f32>)) if the comparison happened.

In our case ObservationAttributes are F32, and they are predefined — imagine it’s a Re-ID quality parameter. However, we don’t actually use it in Voting, ObservationAttributes::MetricObject is F32 as well. And f32::calculate_metric_object is just abs(x-y).

We are looking for euclidean distance calculation for Re-ID vectors, and it’s also there.

The next method is optimize . It is used after two tracks are merged or observation is added to the track. The purpose of optimizing is to keep tracks sane — in our case, we just leave up to 5 last observations.

The last one is postprocess_distances . The method is used to filter the outlier distances before passing data to the Voting Engine. It is an optimization technique that is more efficient than doing it later in the Voting Engine. In our case, we just remove the distances that don’t fit for sure.

All the required objects are defined. Let’s go to a tracker sample. First, let’s observe the whole demo tracker:

It’s pretty straightforward what happens there, but let’s observe important parts separately:

Here you can see the generation of Re-ID and bounding box. Next, a new track is created, and the observation is added to it. The steps are repeated for obj1t and obj2t .

The next important part is voting:

Please look at line 6 — if no merge candidates, then the track is added to the store as new, or else the track is merged with the candidate.

The full code is available in the Similari repository.

Conclusion

We have built a very simple tracker that uses feature similarity to match objects. The tracker can be significantly improved by adding the IoU estimation from the previous article.

You can find other tracker samples in the Similari repository examples:

  • Partial track merging engine based on feature similarity (Euclidean distance);
  • Simple feature search engine example (but better use HNSW for that).

Also, there are performance benchmarks in the Similari repo that you can run with a Rust nightly compiler.

If you like Similari, please rate the repo :). As an author, I would highly appreciate the feedback and questions about the Similari.

--

--