Taking the long road (Part 1): bindings for a vector similarity search library
Perhaps you have once wondered how search engines such as Google and TinEye enable their users to search for images which are similar to one that you provide, or how they can identify a building from nothing but a picture. Content-based image retrieval (CBIR) is the backbone concept, and provides exciting new ways to search for useful information. While the concept is no longer novel, the requirements imposed on systems for CBIR are ever increasing due to the increasingly larger amounts of data and demand for higher quality of retrieval.
A traditional CBIR system, aside from file storage and user interfacing, is internally implemented with these major components:
- The feature extractors process the image into feature vectors, usually numerical, which are used to describe that image. With good feature extractors, feature vectors from similar images will also be similar in their feature space. Low-level features may involve color, shape, or texture recognition (thresholds, intensity histograms, contour detection, etc.), while others may extract keypoints from an image (e.g. SIFT) and produce a global descriptor based on a visual code book with a quantization procedure (e.g. bags of visual words). However, many systems nowadays use convolutional neural networks instead of handcrafted features.
- The feature database, which brings reason to this blog post, indexes these features for fast retrieval by similarity. It is not unusual to employ L2-norm (Euclidean) distance to measure the dissimilarity (as in, the inverse of similarity) between these vectors, but other similarity metrics exist (cosine distance, earth mover’s distance, etc.). A naive CBIR algorithm would require calculating the distance between the query vector and every single other vector in the database. This can be mitigated with the use of other techniques, such as inverted vector files and locality hashing.
Fulfilling the requirements of the second component is not as easy as it may seem, especially once we deal with millions of vectors, or a thousand times more. Thus comes Faiss, an open-source library for fast and efficient vector similarity searching. It encompasses a set of various algorithms in order to index, cluster, and search for vectors, each with their benefits and drawbacks. Moreover, some of these implementations can be GPU-accelerated!
I picture Rust as a good technology for conceiving a CBIR system, and there aren’t many Rust libraries for these tasks as we speak, let alone one representing the state of the art in vector searching. However, the road towards integrating Faiss with Rust may appear a bit too long: the library was implemented in C++, and only provides bindings for Python officially, as well as a SWIG definition, which doesn’t really help much in this case.
But well, is a road just a means to an end? While I appreciate highways in my weekly driving, this is a situation where the analogy falls a bit short: there is a bit more to learn when taking detours such as this one. With this post, I would like to share some remarks of my experience with taking the long road.
… Wait, what?
I wrote and implemented a plain C API for Faiss. I proposed it and sent a pull request at the official repository. Although it is not complete, it was substantial enough to expose most of the features of the library. Writing the API by hand allowed me to take full control of how the API would be used in C, as well as how it would be safely encapsulated in Rust.
I don’t reject the idea that applying
bindgen directly over the C++ API would have taken me far. The API does not make extensive use of templates and most functions have prototypes taking parameters compatible with C. On the other hand, I felt reluctant to use it directly, because if something didn’t work right, I would have to cover the gaps with some C patches among the rest of the interface. It would still require building and linking additional portions of C and C++ code to bridge everything together, and in the end we wouldn’t have something that other languages could use (or at least those without SWIG; now Julia can pick it up too!). A deeper look into the current C++ support in
bindgen also presents at least one critical reason to make a full C wrapper: there is still no support for exceptions, which Faiss does use (more on that in one of the sections below).
Let’s C what we have here…
The native public API of Faiss comprises a significant number of header files. Users of the library are expected to include the appropriate interfaces depending on what they’ll use.
“Index.h” contains the base class for all index types,
Index, which declares multiple virtual functions (
range_search, and so on). Other index types, such as
IndexIVF, would inherit the main interface from this base type.
“AutoTune.h” is also particularly important, because it contains an index factory function and a generic property setter. It allows the user of the API to delegate the decision of obtaining an algorithm implementation to run time, without having to look for the respective index class types, and without the need for specific constructors and decorators. In our context, this factory makes a façade that we can rely on, instead of building wrappers around the public APIs of every single kind of index. Still, a few custom wrappers were made, so as to obtain some transparency and minimal construction overhead. Namely, creating a flat index (where vector are contained in a contiguous matrix and are searched over with a brute-force search algorithm) can be done directly with
faiss_IndexFlat_new. This also provides access to the internal data vector via getter functions.
Hence, moving straight to the point:
- The full C interface and respective implementation reside in a new “c_api” folder. Public header files were created in the format “…_c.h”. There is mostly a 1:1 mapping from a C++ header file and its respective plain C interface.
- Indices in the C API were declared as opaque pointers to a hidden handle type, and were given the
Faissprefix, to overcome the lack of namespaces. Constructors would have a
_new_with_…for constructors with more parameters) and take a pointer to a pointer of that type as the first argument, and the first argument would henceforth be used as the member function’s
- Some plain data types such as enums and configuration property sets were either redefined into an equivalent type or made into opaque types, case dependent.
- Functions were given the
faiss_prefix. Certain parameter types were also replaced to a C compatible alternative (mostly references to pointers).
Cutting on repetition
Macros in C are not hygienic, but they enabled a lot of code reusability nonetheless. I tried not to overdo it, but here are a few examples of macros that I wrote:
The use of casts is extensive, but made with care. Casts from opaque types to the respective C++ type, and vice versa, are made with
reinterpret_cast, which cannot discard existing cv-qualifiers (for example, you cannot remove
const). One can also notice that every index type is actually the same underneath. This is done so because C does not have type inheritance, and giving each index their own handler would require many messy casts or function replication at the top level.
Error Handling? Exceptional!
Faiss mainly uses exceptions for error handling (there are also some hard assertions which will unavoidably terminate the program, but hopefully that will change eventually). Any call to a function which might throw needs to be wrapped around some kind of safety net that catches anything attempting to go up the stack.
Returning only an error code is very limited here, since the exception contains a custom error message in a heap allocated
std::string that we are not exposing. Ideally, we would like this message to be accessible from our bindings. To tackle this, I went for a function which retrieves the error message of the last occurring exception in the public API. Problematic in a multi-threaded environment, you say? Not really: for the implementation, all exceptions caught are saved in a global thread local variable. Errors are subsequently handled in the same thread, and the returned pointer can be used before another call to Faiss is performed.
With the macro above, error handling becomes sufficiently slick:
This approach may look unconventional, but to this day I have found nothing wrong with it. Feel free to change my mind.
Bridging everything together…
Finally, a new Makefile was written to join the pieces together: (1) the static C++ library “libfaiss.a”, (2) and the several object files of the C wrapper. The outcome was a dynamic object “libfaiss_c.so” that is self-contained. That is, with the right combination of GCC flags, this library does not require dynamic linking of the C++ standard library or other parts Faiss. In case you’re wondering (source):
What about GPU support?
I’m glad you asked! The API for GPU indices in Faiss works like this:
- A GPU resources descriptor (“GpuResources.h” and “StandardGpuResources.h”) is created beforehand to expose a GPU device and configure how its resources are made available.
- Faiss indices become GPU-accelerated by passing a “CPU-backed” index to a special function
index_cpu_to_gpuin “GpuAutoTune.h”. The same index can be brought back to the CPU with
index_gpu_to_cpu. The former function requires a reference to a GPU resource object (or more, if you want to use multiple GPUs).
The reasoning applied here is similar:
FaissStandardGpuResources is an opaque type, and the aforementioned functions were also properly wrapped. A
FaissGpuIndex was also declared in a similar fashion to the ones mentioned above, thus inheriting all member functions seamlessly.
Faiss with GPU support is then linked together into an extended dynamic object, “libgpufaiss_c.so”, which fully replaces “libfaiss_c.so” and already links with the CUDA runtime and cuBLAS. In this case, linking with the C++ standard library was unavoidable since cuBLAS depends on it, but this appeared not to be an issue anyway.
Not so Rusty yet?
This is quite a journey, eh? Given the length of this blog post, I have decided to split it into 2 parts. I’ll be finishing the story with how I built the low-level, and then the high-level Rust interfaces, in a future blog post. Stay tuned!