BEBLID: Why local feature descriptors are still important in 2020?

Iago Suárez
Analytics Vidhya
Published in
3 min readMay 6, 2020

--

Recently we have published paper introducing a new local image feature descriptor called BEBLID: Boosted Efficient Binary Local Image Descriptor. We made available its source code: https://github.com/iago-suarez/BEBLID This post is an informal introduction to it and I hope to convince you of its interest.

If there has been a recent revolution in computer vision it is, without any doubt, Deep Learning. Convolutional Neuronal Networks (CNNs) are amazing tools to automatically learn very rich image features. They are powerful, trainable end-to-end and represent very abstract knowledge, but they have a mayor drawback: They are slow! If you try to push one of this networks into your mobile phone using it a lot you will probably see how your mobile phone starts to burn and your battery dies. That’s why efficient vision still makes sense nowadays.

There is a special case where this becomes more important if possible: 3D vision. Most 3D vision techniques like Simultaneous Localization and Mapping (SLAM), Structure from Motion (SfM) or real-time self-localization depend on fast feature matching to realte the features of some scene model and the ones that the camera is seeing right now.

Image taken from ORB-SLAM2 demo: https://www.youtube.com/watch?v=j2ZNuBCr4SA

It is in this context where low level features still play an important role. But what is a local feature? Well, just a region easy to recognize: A blob, a corner a segment… If you want a great introduction I recommend this tutorial from OpenCV. The point is that we usually want our method to memorize these corners in such a way that the next time it sees them it says: Hey, I think we’ve met before! And that’s what BEBLID does.

BEBLID describes a small part of an image using a bunch of zeros and ones (a binary string), in such a way that the next time we detect the same part of the image with another illumination or from a different perspective both binary strings will be pretty similar. The binary string is like the image fingerprint.

Compared to other similar descriptors like ORB, LATCH or BinBoost, BEBLID is faster and more accurate describing blobs and corners, less than 2 milliseconds to describe one image. See all the details in the paper. Here I’m just going to show you an example of how to use it:

The example is working with OpenCV 4.1.0. Basically it imports the BEBLID descriptor you can find here, detects ORB keypoins, describes using BEBLID, matches with the OpenCV BFMatcher and prints the matched features between both images. You have the full code in the BEBLID demo.

We are planing to do a pull request to include the descriptor in OpenCV so in the near future we will provide Python and Java demos. Right now, I’m sorry but it is only C++.

To sum up:

  • BEBLID is a very efficient binary valued interest points descriptor.
  • It extracts 2000 descriptors per image in less than 2 ms in a smartphone CPU. This is great if, like me, you are doing Augmented Reality.
  • It is as fast as ORB, with better mAP than BinBoost and close to SIFT in the HPatches bechmark.

--

--

Iago Suárez
Analytics Vidhya

Computer vision engineer specialized in Augmented Reality. My PhD pushes forward this technology using machine learning and image processing.