Improving Automated Extrinsics Calibration by Unifying Image Specifications

Xtreme1
Multimodal Data Training
5 min readJan 21, 2023

1. Intro

It is quite easy for humans to perceive the world with the help of sensory organs.

However, it proves more difficult for machines to sense the outside.

In order to enable a machine to get its sensory power, a variety of sensors came into being, such as smoke alarms, infrared sensors for automatic doors, etc.

In the field of autonomous driving, multiple sensors constitute the collecting system of a vehicle.

Now, let’s talk about Cameras and LiDAR in the collecting systems to see how they work together to provide vision for vehicles.

The sensors equipped by a self-drivingcar

2. Basic Knowledge

Perhaps there is a question mark over your head: why do humans need only one organ, the eyes, to see the world while cars need cameras and LiDAR sensors at the same time?

Imagine a leaf floating in front of your eyes and blocking your view. You will, I presume, throw away that leaf and continue to move.

Then the leaf drifts to the camera of a moving self-driving car and blocks its view. The vehicle takes a dark picture and wonders “Did I crash?”

“One cloud is enough to eclipse the sun” — Thomas Fuller.

In this scenario, a LiDAR sensor can be instrumental. A car equipped with LiDAR can identify a leaf as a small, thin object and will not misinterpret it as something else, avoiding unnecessary panic.

Now you know that LiDAR has a strong edge in identifying the 3D information of objects.

Let’s go further. It’s not fairly enough to just deploy Cameras and LiDARs.

The real challenge for a vehicle is realizing that “the small thin object” in the LiDAR is the same one as “the big black object” in the camera. This calls for the integration of the data from two kinds of sensor so that the information can be matched.

That is the process of mapping the points in the radar coordinate system to the image coordinate system (or vice versa).

To be more specific, the radar-to-image coordinate system mapping can be a process during which: 1) the points in the radar coordinate system, after rotation and translation, are projected on the camera coordinate system. (The first transformation shows their position in the camera’s “eye”); 2) Then, after scaling and translation of the coordinate system, the points are projected onto the image coordinate system.

The parameters associated with the first transformation are called Extrinsics, and the parameters in the second transformation are Intrinsics.

The extrinsics are related to the relative position between the radar and camera, while the intrinsics are related to the camera itself.

The principle is not complicated, but: where did these parameters come from?

This is the problem on the point: Extrinsics Calibration.

3. Difficulties in Calibration

In the field of autonomous driving, the calibration co-conducted by humans and machines is called Offline Extrinsics Calibration.

A staff conducting calibrating

Offline Extrinsics Calibration has its limitations. It needs human effort and certain markers. We want a more efficient method that can work anytime and anywhere.

The Online Extrinsics Calibration might be an answer. By that method, we provide training data for machines to acquire the projection patterns between 3D point cloud and 2D images. Through that way, machines are able to What followed was “online external reference calibration”. The idea is that given training data, the machine can make inferences of extrinsics, however the relative position of the radar/camera changes.

It is not magic.

Without professional training, we humans can infer the shooting position and angle from a picture.

Look at the image above, you may know that the camera rotates about 10 degrees clockwise and is very close to the pigeon.

Below is the network of LCCNet (LiDAR-Camera Self-calibration Network). The network first takes RGB images and the mis-calibrated depth image (made from the projection of point cloud data with a preliminary extrinsics T_initand intrinsics) as the input; and uses the neural network to predict the deviation T_pred between the T_init and the true value of the extrinsics T_LC. The final extrinsic parameters can be obtained after T_initis corrected by T_pred.

LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network

Online calibration seems perfect, but there is a problem: the working algorithm is related to the specifications of the camera model in use. When the camera specifications change, the algorithm needs to be adjusted, which limits the application and practicality of the existing technology.

In the human eyes, there is almost no difference between two photographs taken of the same scene but with different resolutions. However, in the eyes of the machine, they are two completely different things, and it cannot apply the rules it has just learned to the current situation.

4. Solution

We are often told to learn by analogy, which means we can transform a new problem into an old one with solutions.

The rules machine just learned will keep working as long as there is a way to transform images with unknown specifications into ones with known specifications.

This is the solution to dealing with different cameras: defining a standard image, and introducing it to conversion module, to process non-standard images into standard images with uniform specifications.

We are often told to learn by analogy, which means we can transform a new problem into an old one with solutions. The rules that the machine has learned will continue to work as long as there is a way to transform images with unknown specifications into ones with known specifications. This is the solution to dealing with different cameras: by defining a standard image and introducing it to a conversion module, non-standard images can be processed and transformed into standard images with uniform specifications.

A comparison of the extrinsics calibration processes

Of course, image conversion is not simply stretching the non-standard image to the standard resolution. Differences between images taken by the two cameras are essentially due to different intrinsics. Hence, the key to image conversion lies in avoiding the influences by intrinsics.

With the help of the conversion module, no matter how the camera specifications change, the image input part of the extrinsic calibration algorithm always maintains a unified rule, which effectively avoids differences and expands application of the algorithm.

Reference

  1. LCCNet: LiDAR and Camera Self-Calibration using Cost Volume Network
  2. Understanding ADAS Calibration, https://www.fenderbender.com/articles/14939-understanding-adas-calibration

--

--

Xtreme1
Multimodal Data Training

Xtreme1 - World's 1st Open-Source Platform for Multisensory Training Data. Find us on GitHub: https://github.com/xtreme1-io/xtreme1