Taking the long road (Part 2): Rust bindings for a vector similarity search library

This is part 2 of a story on taking the long road towards Rust bindings to Faiss. You may wish to read part 1 first for a motivation section and a deeper understanding of how I built a plain C API on top of the C++ library.

Now that we have a C API, it may seem that the trickiest challenge has already been overcome (and hopefully so, otherwise the nearly 3 thousand lines of code would have been written for nothing). Still, this story deserves the complete and satisfactory feeling of wrapping a safe layer around a less safe native interface. Of course, we start with the low-level bindings first, and then escalate to an idiomatic Rust API.

faiss-sys : low-level bindings

It is part of a strong convention in Cargo to dedicate linking and direct mapping of a system library to a crate with a -sys suffix in the name. So, I first created faiss-sys, which provides unsafe access to all functions and types from the C API. However, when building this crate as it is, Cargo will attempt to link with the faiss library, which is not what we want. A more accurate link would have been to faiss_c, as indicated in Part 1, since the former would be the bare C++ library, and the latter is the full, C API included shared object. Moreover, we want users to have access to the GPU implementations if so is desired, and those were built into a separate library gpufaiss_c. The solution here: a new “gpu” Cargo feature is declared (disabled by default), and through a special build script output, the crate can define whether to link with one or the other based on this feature.

With that done, we still need the Rust code exposing the API. Since we now have a C89-compatible interface, Rust bindgen is an excellent tool for the job. We can fetch all the headers of the API, glue them all together into a single file, and run bindgen every time there’s a change in the API. Despite bindgen’s suggestion of creating these bindings at the crate’s building phase, this particular interface does not have anything too platform specific that could not be described with a single translation, and so this is seemingly safe. The Rust bindings for TensorFlow do roughly the same thing for the crate tensorflow-sys. The actual command is something along these lines:

bindgen --link faiss_c --whitelist-function faiss_.* --whitelist-type idx_t|Faiss.* c_api.h -o src/bindings.rs

The white-listing was made to ignore other unneeded public-facing pieces of third-party APIs (such as the CUDA runtime headers). The command is also replicated for the GPU version.

I actually wrote a shell script which retrieves the headers from the repository and creates both faiss_c and gpufaiss_c interface files in “src/bindings.rs” and “src/bindings_gpu.rs”, respectively. Then, I used conditional compilation attributes to export either one of them.

The low level bindings are ready. The following sections regard faiss, the crate containing the high level bindings.

Bringing errors to a higher level

Operations with recoverable errors in Rust should return a Result<T, E>, where E is a well behaved (never forget C-GOOD-ERR) error type. When working on a library, I usually keep new error types in an error module, often in which one of them is a sum of all the others. In addition, I add in a Result<T> type alias which maps the type parameter E to the top-level error type, thus making the rest of the code in the crate cleaner and intuitive. In this crate, I have followed the same pattern with two error types in “error.rs”.

NativeErrors encapsulate errors emerging from the low-level interface, whereas the top-level Error is an enum type for either a native error or some other issue detected at a higher level of the API. The former is built by calling faiss_get_last_error() and copying the error message into an owned string. A copy is necessary because I established a contract in which the error message will only live as long as the next call to any function in the interface. I have encapsulated this logic in a static method:

The other trait implementations for a well behaved error (namely Display, std::error::Error, and From<Error>) were written manually. For such a simple case, it was deemed unnecessary to use a helper crate.

And on top of this, we can make native function calls easier to handle. These functions return an error code which needs to be checked after every single call, making a fast return if the function fails. We can make this less of a burden with a macro:

Yes, this is similar to the old-school try! macro, except that it works specifically for errors from the native library. To serve as an example of usage, here’s how one of the key functions of the library, index_factory, is fully implemented:

The points to take from here is:

  • Simple types (such as i32 and MetricType) are statically cast;
  • Strings are converted to null-terminated strings with CString;
  • faiss_try!(all_the_function_calls(...));!

Covering the Index

Since it’s easy to treat an index as an object, the high-level API for indices attempts to mimic the original object-oriented design. An Indextrait was designed which basically translates the virtual member functions in the C++ library into idiomatic Rust methods. The name collision with std::ops::Index is a bit unfortunate, but I found this not troubling at all in the development and subsequent use of this crate. You may find the full definition below, but please feel free to only skim through it.

Generally, the trait’s methods won’t surprise anyone already familiar with Faiss. However, there is one exceptional thing here: the methods assign, search and ranged_search require a mutable self, even though they do not modify the state of the index. These operations require exclusive access because some index implementations do not allow concurrent searching. Since this trait is meant to serve as the base interface to all indices in the crate, the constraint must be applied at this level, and I’ll show a way to lift it in the next sections.

With the trait declared, a new struct IndexImpl is made for a generic implementation. It is composed of a single raw pointer to the type FaissIndex, thus linking the high-level API with any native implementation that could be behind that pointer.

Still wondering how this is used in the end? There is a sample program based on the documentation:

We have finally reached basic functionality of this library from Rust.

However, before these bindings fall Flat…

While the main index trait and the factory function provides enough functionality, sometimes we might wish to dive into a particular index in order to obtain some guarantees. Some of them are even related to safety! In particular, some index implementations allow multiple threads to search concurrently (although searching in batches can actually be faster, and is still recommended). The flat index implementation, albeit naïve in nature, is one of the most accurate indices and still sufficiently fast for many thousands of vectors. And so, it makes a good candidate for a potential extension to the API’s design.

A new trait ConcurrentIndex was made, which declares the same searching methods as Index but taking &self instead of &mut self:

We can safely implement this trait for FlatIndex, and use this value the same way as the generic IndexImpl. Plus, we can now search the index behind an immutable reference.

This is an exciting and intriguing exhibition of the compiler’s function inferrence logic. At first glance, a call index.search(x) may appear ambiguous due to the colliding method names, but since ConcurrentIndex takes an immutable reference, the compiler will just prefer that by default! There is also no drawback in shadowing the other (mutably-ref’d) methods, because they’re implemented exactly the same way.

“Give me your extra resources”

The standard GPU resources implementation was trivially wrapped into a safe layer with a similar pattern: an inner pointer in a struct, with the unsafe functions turned into safe methods. In nearly all cases, arguments and outputs are converted to fit the new, high-level API. As in the low level crate, this type and all dependents only exist if the “gpu” Cargo feature is enabled.

A quick remark: StandardGpuResources implements Send, but must not implement Sync. This deliberate decision comes directly from the notes on threads and safety from the official Faiss Wiki: “A single StandardGpuResources object must be created for each thread that is actively running a GPU Faiss index”. This type not being Sync is simply ideal here, because it clearly prevents its access from more than one thread. It also doesn’t prevent it from being shared by multiple indices, so long as they stay under the same thread boundaries.

With this gfx card, I rule.

The GPU index implementation wrapper is slightly more substantial than the previous one. On the one hand, all operations over the index are called exactly the same way at the faiss-sys layer, regardless of whether it requires GPU resources or not, thanks to the library’s use of dynamic polymorphism. On the other hand, there are two other concerns:

  • Indices with a GPU implementation must not outlive the respective GPU resource descriptors.
  • Since the indexing algorithm is the same, bringing the index back to CPU memory should allow you to retain the original type of the index. This is particularly useful because I added a getter for the vectors in a flat index.

Therefore, the gpu index implementation type is generic over two parameters: a lifetime parameter bound to the scope of the GPU resources, and a type parameter I: CpuIndex for keeping track of the original type of the index before the GPU version was built. As generic parameters, they are zero cost abstractions, and do not incur any overhead.

Note that doing PhantomData<&'g I> would be incorrect, because we’d be erroneously associating the lifetime 'g with the type I (Playground example).

I went to GPU and back

Finally, making a GPU-backed index in the high-level API is as simple as calling the to_gpu method. This results in an independent index with the same algorithm and indexed data. I have also provided into_gpu, which does the same thing while discarding the original index. The naming is compliant with the Rust API guidelines (C-CONV). Likewise, to_cpu and into_cpu allows the user to bring the same index back to main memory, whichs is particularly useful for serialization purposes.

What’s more?

I guess I have covered more than enough ground of this library, and I hope that you have learned something new from here. Other than that, I can only say that some issues are still left in the open:

  • Using this crate requires the Faiss library to be installed separately. Instructions were properly made available for this, but it would be interesting if the crate could optionally bundle a compatible build of the library. This must be an optional step, since there are many compilation choices that the developer may wish to consider (most notably, which BLAS implementation, which CUDA version, and which CUDA compute capability versions).
  • Creating a dedicated type for each index implementation does not scale that well, although some parts can be replicated without copies through the use of macros. It might be best to write dedicated types for those which cannot be instantiated through the index factory.
  • There is currently no sofisticated abstraction for vectors. Slices of f32s were used for efficiency and simplicity, but it is understandable that one might have wished for an actual matrix abstraction, or something resembling a list of vectors. For as long as there isn’t a wide agreement on vector-like data interoperability, which means for as long as const generics are not available in the language, I found it best not to make heavy decisions here.
  • The data type for the search results is currently a composition of two vectors (labels + distances). It would be nice to implement IntoIterator for this type, yielding an iterator of pairs. The already stable impl trait feature might not be usable here, unless one can eventually use it in associated types.
  • Index persistence to disk is something not yet available. As far as my work went so far, I haven’t used index loading and saving capabilities from the library. Having this feature in faiss would be great, though. Update: This landed in version 0.5.0!
  • Some methods in Index , such as add_with_ids, are actually only supported by some specific implementations. It would have been interesting if the methods were not exposed in cases where the compiler was sure that it’s unsupported (such as in the FlatIndex), but this would make the design around the trait even more complicated.

Feel free to let me know if something is unclear or in need of improvements. As for the actual crate, faiss is on crates.io and ready to be used by everyone.