OpenCV Spatial AI Competition Progress Journal — Part II

Mounting OAK-D on TB3 + Streaming depth maps over ROS + Sensor message synchronization + Map creation using only odometry poses

Seung Won Lee
The Inverse Project

--

Introduction

This is Part II of our progress journal for the OpenCV Spatial AI Competition sponsored by Intel. If you missed our previous post, please check out Part I. In this post, we first show how we mounted the OAK-D sensor to our robot platform, then we explain how we publish OAK-D’s depth maps as ROS messages, and finally, we create a small indoor map using odometry information as true poses and use it as our benchmark.

Additionally, we are happy to announce that our project’s repository is accessible publicly on GitHub. Please feel free to explore our implementations and leave us feedback and issues if you are using in your own robotics platform!

Mounting OAK-D on TurtleBot 3 (TB3)

We placed the OAK-D module on top of the TB3 with an aid of a GoPro camera mount. The mount also prevents the module from swerving around, and this is important to obtain a consistent camera’s extrinsic parameters. Additionally, to simplify the extrinsics calibration process, we mounted the OAK-D such that its location aligns with the X-Y coordinates of thebase_footprint frame of the TB3.

Figure 1: OAK-D mounted on our robotics platform.

Streaming depth maps over ROS

Figure 2: Flowchart of depth map streaming.

DepthAI wrapper class for streaming depth maps from OAK-D

Given that our ROS version (ROS Melodic) does not natively support Python 3, we decided to use DepthAI API (depthai-core) written in C++. Data queried from the OAK-D using this API come in as binary streams that need to be re-interpreted into types you expect. In our case, we interpret the binary streams as a single channelcv::Mat. To achieve this, we wrote a wrapper class called DepthAI which inherits the Device class from depthai-core.

The DepthAI wrapper class takes 3 arguments.

  • Device name
  • Configuration file path
  • USB mode

With the appropriate device name and USB mode, it detects and creates an OAK-D device instance. Once the OAK-D device instance is ready, the configuration file is used to set up a pipeline and determine what types of streams must be enabled. In our wrapper class, we use std::unordered_map to place each enabled video stream accordingly by grabbing binary streams being sent by the OAK-D device. We found an issue of frame loss when multiple streams were enabled because of the asynchronous stream rate. We plan to address this issue in the future.

ROS depth map publisher

Now that our device can stream depth maps ascv::Matobjects, the next step is to publish depth maps over a ROS topic for downstream nodes to consume. We call this publisher node DepthMapPublisherNode. As you can see in Figure 2, this node creates a DepthAI instance and receives streams as cv::Matusing DepthAI::get_streams() . Once we obtain a depth map, we encapsulate it as a sensor_msgs::Image using cv_bridge with header information and an appropriate encoding. According to official OAK-D documentation, a depth map array consists of uint_16. Therefore, we encode the image with sensor_msgs::image_encoding::TYPE_16UC1.

Note: In our project, we use the depthai-core library as a sub-module by adding it to CMake package with add_subdirectory() . However, this approach does not work straight out of the box because both depthai-core and Catkin (ROS’s compiler) are Hunter based. We had to explicitly copy the HunterGate part from the depthai-core into our repository. You can check out how they were copied into our project, here.

Throttled odometry and time-synchronized point clouds

Finding optimal poses for all timestamps considered is the critical task in any SLAM problem. As explained in our previous post, we use odometry and depth maps to obtain pose constraints. However, odometry messages are published at a much greater frequency compared to depth maps, and to be able to find an optimal pose at a given timestamp, we need pose estimations from multiple sources for that timestamp. This necessitates two main things — first, clocks on OAK-D and the Jetson Nano must be synchronized and, second, we need a way to temporally match odometry messages and point clouds created from depth maps. We solve these by first throttling the incoming odometry messages spatially, that is, we discard odometry messages that correspond to linear and radial movement less than some small thresholds. Then, we synchronize the incoming depth maps to the throttled odometry messages, giving us matched scenes at each timestamp.

Throttled odometry

We are constantly publishing odometry information from the built-in wheel encoders on TB3. We can filter out repetitive messages by imposing certain spatial thresholds. As a result, we implemented a ROS node that subscribes to raw odometry information and publishes throttled odometry when only certain conditions are met. The conditional parameters are the following.

  • Positional change from a previous pose to a current pose (distance traveled)
  • Rotational change from a previous pose to a current pose (yaw angle)

We can compare the current pose and a previous pose to calculate differences in position and rotation. When one of them exceeds pre-defined thresholds, we can finally publish the throttled odometry. In our project, we set our thresholds to be the following

  • Distance Threshold = 0.2 m
  • Yaw Angle Threshold = 0.262 rad (15 degrees)

Time synchronization

Once we start publishing throttled odometry, we will have to obtain a corresponding depth map at a given position. We can achieve this by using ROS message filters. ROS provides policy-based synchronizers so we can have multiple topics synchronized based on timestamps on ROS messages. We will use ApproximateTime synchronizer because its policy uses an adaptive algorithm. Figure 3 below shows examples of how the ApproximateTimefilter synchronizes messages temporally. After enabling the synchronizer, we can inverse project the time-synchronized depth maps into point clouds.

Figure 3: Explanation of ApproximateTime synchronizer for multiple topics. Each line is a topic, and a dot is a message (red dot: a pivot message). Image excerpted from the ROS wiki.

Creating a map using odometry constraints

Figure 4. A room map with point clouds from OAK-D depth maps and odometry. Red lines represent throttled odometry information. Point colors represent the height (we observe that orange-colored points represent the floor).

In this section, we provide an example of a map that can be achieved without performing any pose optimization! The quality of this map will serve as the benchmark for future maps that do use pose optimization. Since odometry is one estimate of the TB3’s pose, we can create a map considering it as the true location of the TB3. Given that odometry is prone to drift and error accumulation, we do not expect this map to look very sharp.

Now that we have a point cloud and odometry pose available at the same timestamp, we can start building the map. To do this, we created a ROS node called CreateMapNode which constantly concatenates a series of point clouds and publishes the point cloud map. Since the inverse projected point clouds are expressed in OAK-D’s camera reference frame, we first need to transport them to the odom reference frame so we can create a map by concatenating. This is done using thetf2_rospackage.

We took our TB3 on a short trip around the house and recorded raw odometry and depth maps in a rosbag. Then, we visualized the point cloud map using RViz along with throttled odometry. Figure 4 is the completed map after the trip. Please note that this map construction is only using odometry poses without any pose optimization. We will discuss on pose optimization in our future blog post.

Also, we apply different point cloud filtering algorithms on our raw point clouds. Some possible filters provided by the PCL library are the following,

  1. Statistical Outlier Removal
  2. Approximate Voxel Grid
  3. Voxel Grid
  4. Frustum Culling

For the map creation recorded in the following video, we used statistical outlier removal and voxel grid filters on inversed projected point clouds from the depth maps. We are still running different experiments to decide on what filters with what hyperparameters would give the best result. These filers will be very important since we will run ICP (Iterative Closest Point) algorithm on consecutive point clouds to obtain pose constraints.

Video 1: live map construction and visualizing on Rviz.

This is the end of our progress journal #2. Please stay tuned for our updates! Also, feel free to leave comments on this blog post and reach out to us on LinkedIn. In case you missed the introductory part of the journal, please check out our repository!

Reach out to us

  • Deepak Talwar (LinkedIn) is currently a final year Master’s student in Computer Engineering at San José State University focusing on Robotics and AI. He was previously an intern at FarmWise Labs and is currently interning at DeepMap Inc.
  • Seung Won Lee (LinkedIn) recently received his master’s degree in Computer Engineering from San José State University where he completed various projects related to computer vision, especially in autonomous driving. His master’s thesis, “Enhancing Point Cloud Density using Stereo Images”, adopts methods to combine regular point clouds with pseudo-LiDAR point clouds for improving 3D object detection.
  • Sachin Guruswamy (LinkedIn) recently received his master’s degree in Computer Engineering from San José State University with a focus on Robotics and Computer Vision. He currently works at Luxonis on their Depth.ai product.

Disclaimer: Our team member, Sachin Guruswamy, was hired by Luxonis after we submitted our proposal and got selected amongst Phase-I winners. Due to this reason, while we will still be submitting our work for this competition, we will not be in the running for the prize money for this competition.

--

--