Multi-Cue Direct SLAM

Cyrill Stachniss
StachnissLab
Published in
4 min readSep 26, 2022

Simultaneous localization and mapping, or SLAM, is a common and well-studied problem in robotics. It is a key building block of robot navigation systems. After roughly three decades of research, effective solutions are available. As many systems rely on SLAM, it still receives much attention in academia and industry alike. The advent of robust machine learning systems over the last decade allowed the community to enhance purely geometric maps with semantic information or replace hard-coded heuristics with data-driven ones. Within the computer vision community, we have seen photometric, also called direct approaches, used to tackle the SLAM/SfM problem. The direct techniques address pair-wise registration by minimizing the pixel-wise error between image pairs. By not relying on specific features and having the potential of operating at subpixel resolution on the entire image, direct approaches do not require explicit data association. These methods have been successfully used on monocular, stereo, and RGB-D images. Their use on 3D LiDAR data is less prominent — probably due to the comparably limited vertical resolution relation to cameras.

A 3D map in form of a point cloud generated by MD-SLAM

Bartolomeo Della Corte and Igor Bogoslavskyi presented a multi-cue photometric registration methodology called Multi-Cue Photometric Registration (or MPR) for RGB-D cameras. It is a system that extends photometric approaches to different projective models and enhances the robustness by considering additional cues such as normals and depth or range in the cost function. Recently released 3D LiDARs sensors offer up to 128 beams, thus making direct approaches also more relevant for 3D LiDAR scanners. In addition, most LiDARs provide intensity or reflectivity information besides range data. This intensity can be used to sense a light reflectivity cue from the objects in the environment.

A recent work by Luca Di Giammarino and colleagues to be presented at IROS 2022 in Kyoto tackles the challenge of developing a flexible multi-cue direct SLAM system to be used with modern 3D LiDARs. It is an open-source SLAM system that can deal with RGB-D and LiDAR in a unified manner. It is basically a new version of the MPR system that computes the incremental motion for RGB-D cameras as well as 3D LiDAR sensors. MD-SLAM also finds loop closures by an appearance-based algorithm that uses a binary search tree structure populated with binary feature descriptors. The code, which is available, serves as a compact reference implementation. It is designed for flexibility, hence not optimized to a specific sensor. Nevertheless, it shows a solid performance in different setups. The paper shows results for RGB-D and LiDAR data using common benchmark datasets. The accuracy is competitive concerning other sensor-specific SLAM systems, while it outperforms them if some assumptions about the structure of the environment are violated.

MD-SLAM running on 3D LiDAR data

Technically, MD-SLAM uses the common elements of a modern SLAM system. The approach relies on a pose graph to represent the map. The nodes of the pose graph store keyframes in the form of multi-cue image pyramids. The pipeline uses as input the intensity and depth data, either for RGB-D or a 3D LiDAR sources. The pyramids are generated from the input images each time a new frame becomes available. By processing the range information, the system computes the surface normals and organizes them into a three-channel image, which is then stacked to the original input to form a five-channel image. In the next steps, the 5-channel pyramids are fed to a tracker, which is responsible for estimating the relative transform between the last keyframe and the current pyramid through direct error minimization. The tracker is in charge of spawning new keyframes and adding them to the graph when necessary. Whenever a new keyframe is generated, the loop closure module seeks potential relocalization candidates between the past keyframes by performing a search in appearance space. Candidate matches are further pruned by geometric validation and direct refinement. Successful loop closures result in the addition of new constraints in the pose graph and trigger a complete graph optimization. This is all realized as a compact and easy-to-use direct SLAM system available as open-source software. Even with a single thread, the system operates online for small image sizes.

MD-SLAM running on RGB-D data indoors

For more information, see:

L. Di Giammarino, L. Brizi, T. Guadagnino, C. Stachniss, and G. Grisetti, “MD-SLAM: Multi-Cue Direct SLAM,” in Proc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2022.

Paper: https://www.ipb.uni-bonn.de/wp-content/papercite-data/pdf/digiammarino2022iros.pdf
Code: https://github.com/digiamm/md_slam

--

--