Generating Correspondence Ground Truth Using Ground and Aerial Data

4 min readFeb 27, 2023

Correspondence ground truth data are crucial for tackling the correspondence problem, which can be used for training learning based systems and evaluating performance. However, they are difficult to obtain, especially for outdoor environments. Although autonomous vehicle (AV) datasets can be a good option, which contain images, camera poses, and well synced LiDAR points, due to hardware limitation, the LiDAR points only cover objects within a certain range of data collection vehicles. Large-scale aerial LiDAR data are increasingly available, which are collected by overflying aircrafts. In this post, we show how to combine aerial LiDAR data with AV datasets and generate dense and more complete correspondence ground truth.

Here we use the nuScenes dataset, which contains 1000 driving scenes in Boston and Singapore. Other similar datasets can also be used. The aerial LiDAR point clouds of Boston are publicly available (downloadable from here).

The nuScenes dataset defines multiple coordinates, and each sensor data live in their own coordinate. Recall that camera pose can be derived from camera extrinsic matrix, which is the transforms from global frame to camera frame. We read out the transform from global frame to ego vehicle frame of image timestamp, denoted as T1, and the transform from ego vehicle frame of image timestamp to camera frame, denoted as T2. The camera extrinsic matrix is given by T2*T1.

Data Registration

In order to use the nuScenes dataset together with aerial LiDAR data, two datasets need to be co-registrated to reduce misalignment. This is highly desired for data captured by different sensors. The nuScenes data provides a global map that shows road areas. See the figure below.

We need to register this map to aerial LiDAR data. A common practice is to select ground control points (matching points) and estimate affine transform between two datasets. The first step is to convert aerial LiDAR points into a raster map, shown as below.

Raster map converted from aerial LiDAR points (grey level indicates height)

Due to the lack of details on the nuScenes globe map, it is difficult to select ground control points. To address this issue, we add LiDAR points at different vehicle locations to the nuScenes map, which give much more structural details that help matching point selection.

Ground control points selection. Left: nuScenes map with ground LiDAR.
Right: aerial LiDAR

Depth Map Generation

After registration, aerial LiDAR points can be projected onto images. As we can see in the top image below, aerial LiDAR points are well aligned with image objects.

Image overlaid with LiDAR points.
(Top: aerial LiDAR. Bottom: ground LiDAR)

Both kinds of LiDAR points have their own properties. 1) The Ground LiDAR points measure depth well, but they are available within a certain distance. 2) The aerial LiDAR are distributed over the space with similar point density. Therefore, we can generate a complete depth map by combining both LiDAR points.

To obtain the depth image from aerial LiDAR points, a typical approach is mesh generation followed by rendering. However, the mesh generated using standard methods (TIN) do not have a good quality when viewing from ground level, due to missing points and noise. Here we take a simple approach that produces reasonable results. We select aerial LiDAR points a certain distance away from camera location and then create a vertical line segment from a point to the ground. After projecting a line to the image, we slightly widen it. To have correct visibility, we apply the Painter’s algorithm, i.e., projecting lines in the order from farthest to closest. The resulting depth map is shown below. After interpolating ground LiDAR points into dense depth map, the complete depth map can be obtained by merging two depth maps.

Depth map merging. Top: depth from ground LIDAR.
Middle: depth from aerial LIDAR. Bottom: merged depth

With camera poses and depth maps available, we can easily obtain correspondence between any pair of images by computing world coordinates of each pixel in Image A and projecting them into Image B. Such correspondence ground truth are dense and complete. A few examples are shown below. Matching pairs are downsampled for better visualization.

Generating Correspondence Ground Truth Using Ground and Aerial Data

Written by jiangye yuan