Overlap is key to a good point cloud alignment.

Aligning 3D point clouds by deep overlap attention.

Published in

EcoVisionETH

6 min readMar 31, 2021

Point clouds can nowadays be readily acquired using consumer devices such as the iPhone 12 Pro. They are of relatively high quality, but due to a rather small field-of-view of the device, a single acquisition often covers only a part of the surrounding. Hence, several acquisitions are required. Because the device moves between these acquisitions, the resulting point clouds first have to be aligned before they can be merged into one concise representation of our surroundings.

Overlap regions predicted by our approach. Use the slider to change between the input point clouds and the overlap prediction (colored points are predicted to lie in the overlap).

PREDATOR is a method for pairwise point cloud registration with deep attention to the overlap region. Transformer inspired overlap attention module allows Predator to focus the attention on the overlapping region and hence better align point clouds with low overlap.

How does one even align two point clouds?

In order to align two point clouds, the rotation and the translation of the device between the acquisitions need to be estimated. The poses of the device can be directly recovered if at least three corresponding points from both point clouds are known. To automatize the procedure, these corresponding points are typically established using so-called 3D local feature descriptors. Local feature descriptors are high dimensional vectors that describe local geometric peculiarities around the “interest points” such as corners or edges. They are traditionally computed using deep neural networks just like the image-based 2D feature descriptors.

Local feature descriptors mapped to RGB space using TSN. Similar colors denote similar feature descriptors (adapted from Choy, 2019).

So what makes the point cloud alignment hard?

That is actually a good question and looking at the results of the recent methods such as FCGF (Choy, et al. 2019) or D3Feat (Bai et al. 2020) one could conclude that nothing, and that the problem is already solved. But as with most things, the devil is actually in the details. Current methods work excellent on the benchmark datasets, which consider only pairs with more than 30% relative overlap. This means that at least 30% of each point cloud covers a region of the scene, which is also covered by the other point cloud. However, existing methods break quite quickly when the relative overlap of the point clouds is reduced. The reason is that in this low overlap scenario, most interest points do not lie in the overlap region and hence can not even have a correspondence in the other point cloud!

The performance of current methods deteriorates rapidly when point cloud pairs with low overlap are considered (left). But there is hope! If one can somehow bias the sampling of the interest points to the overlap region the performance recovers equally fast (right).

How can we improve the alignment in the low overlap regime?

Based on the above figure, the solution seems easy enough. The interest points just have to be sampled in the overlap region. But wait, how does one know which points are in the overlap region? Let us briefly think about how would a human operator align two point clouds? In the first step, one would probably acquire an overview of both point clouds and would roughly determine their common parts. Only then one would try to identify precise corresponding points in those common regions. This thought process helped us identify a common conceptual drawback of the current methods:

Current point cloud registration methods treat individual point clouds in isolation, when computing their local feature descriptors.

In our recent work (Huang et al., 2020), we instead argue that the information of both point clouds should be mixed early on, just like a human operator first looks at both point clouds together. To this end, we introduced PREDATOR a learned local feature descriptor with a novel overlap attention block. Overlap attention block takes inspiration from the transformer networks and combines the idea with the graph neural networks. It is used to mix the information from both point clouds in the early stage with the goal of predicting the relative overlap and guiding the later sampling of the interest points to lie in that region.

PREDATOR is equipped with an overlap attention block (middle part of the figure), which helps to predict the overlap region of both point clouds and guides the subsequent feature extraction and interest point sampling.

Does early information mixing really help?

We evaluate PREDATOR on a standard benchmark dataset (3DMatch) as well as on a new dataset (3DLoMatch) that comprises point cloud pairs with only 10–30% relative overlap. The first question is: can PREDATOR actually increase the relative overlap of the point cloud pairs?

PREDATOR effectively increases the relative overlap by two times and 35% on the low overlap (left) and benchmark (right) datasets, respectively.

It can indeed! By filtering out the points with low overlap scores, PREDATOR almost doubles the average overlap on the low overlap dataset, while also increasing it by more than 35% on a normal benchmark dataset.

This sounds very promising, but how does it affect the actual alignment of the point clouds? To evaluate this, we measure the registration recall (ratio of successful alignments) on the same datasets as mentioned above. On average PREDATOR outperforms existing methods by ~15 percentage points on the 3DLoMatch dataset and ~5 percent point on 3DMatch. Some qualitative results are shown below.

Qualitative results on the 3DLoMatch dataset.

Relevance for environmental applications

Laser scanning with terrestrial and UAV sensors is also widely used for environmental applications. In particular, in forest ecology, point clouds are processed to extract a variety of forest structure measurements such as tree height, stem diameter, and canopy density, which are ultimately predictive of aboveground biomass. With the increased availability of point clouds from consumer devices, these will inevitably be adopted in citizen scientist initiatives, leading to a vast amount of distributed point clouds with potentially low overlap. Thus, creating better 3D models from low overlap point clouds is very relevant for forest monitoring purposes.

Example of a UAV-LS point cloud (cyan) and TLS point cloud (greyscale) from a recent survey in Switzerland (Morsdorf et al. 2017)

In conclusion:

We have provided a short summary of PREDATOR a recent method for point cloud alignment that relies on the deep overlap attenuation. PREDATOR can also be used to register the point clouds of synthetic data or LiDAR point clouds acquired for autonomous driving. But not to spoil everything that PREDATOR can do, we advise you to check the paper for detailed information, ablation studies, and results, please check out our paper. If you are more adventurous and would like to play around with PREDATOR, we have also made our source-code publicly available.

Paper: https://arxiv.org/abs/2011.13005

Source code: https://github.com/ShengyuH/OverlapPredator

Huang, S., Gojcic, Z., Usvyatsov, M., Wieser, A., & Schindler, K. (2021). PREDATOR: Registration of 3D Point Clouds with Low Overlap. IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

References:

Choy, C., Park, J., & Koltun, V. (2019). Fully convolutional geometric features. In: CVPR (pp. 8958–8966).
Bai, X., Luo, Z., Zhou, L., Fu, H., Quan, L., & Tai, C. L. (2020). D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features. In: CVPR(pp. 6359–6367).
Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., & Funkhouser, T. (2017). 3dmatch: Learning local geometric descriptors from rgb-d reconstructions. In: CVPR(pp. 1802–1811).
Morsdorf, F.; Eck, C.; Zgraggen, C.; Imbach, B.; Schneider, F. D.; Kükenbrink, D. (2017). UAV-based LiDAR acquisition for the derivation of high-resolution forest and ground information. The Leading Edge, 36(7):566–570.