How could you make Pixel-to-Point Conversion?

brewmaster
newworld-kim
Published in
4 min readJan 8, 2023
Image from KiTTI Dataset

It is common to use pixels of a camera to points of LiDAR conversion when creating a 3D model of an environment using data from both a camera and a lidar sensor. The camera provides a 2D image of the environment, while the lidar sensor offers a 3D point cloud of the domain. To create a 3D model that is consistent across both data sources, it is necessary to convert the 2D pixel coordinates of the camera into 3D point coordinates in the lidar frame of reference. This can be done using techniques such as triangulation or perspective projection.

How could you make the conversion?

Sensor Calibration

Sensor calibration is the process of determining the intrinsic and extrinsic parameters of a sensor, such as a camera or LiDAR. Intrinsic parameters are properties of the sensor itself, such as the focal length of a camera or the angle of a lidar pulse, while extrinsic parameters describe the position and orientation of the sensor in relation to a reference frame.

This typically involves collecting data from both sensors while they are mounted on a known system and using this data to determine the intrinsic and extrinsic parameters of each sensor. This can be done using a variety of methods, such as least squares optimization or bundle adjustment.

The matrix of Camera-LiDAR extrinsic calibration describes the position and orientation of the camera relative to the LiDAR sensor. It is typically represented as a 4x4 matrix, with the first three columns representing the orientation of the camera in the LiDAR frame and the fourth column representing the camera's position.

The matrix can transform points from the camera frame of reference into the LiDAR frame or vice versa. This allows for the creation of a 3D model that is consistent across both data sources and enables the fusion of information from the camera and LiDAR sensors for various applications.

The specific form of the matrix will depend on the specific coordinate systems and conventions used for the camera and LiDAR frames of reference. It is essential to ensure that these conventions are consistent across both sensors to obtain accurate results.

Image fro mhygenie-studynote.tistory.com
image from 3D LIDAR-Camera Extrinsic Calibration Using an Arbitrary Trihedron

Pixels to Points Conversion

Once the sensors have been calibrated, it is possible to use the intrinsic and extrinsic parameters to transform points in the camera frame of reference into the lidar frame of reference or vice versa. This allows for the creation of a 3D model that is consistent across both data sources and enables the fusion of information from the camera and lidar sensors for various applications.

image from MathWorks

Using calibration information, here is a python example code to convert a pixel coordinate to a 3D point.

import numpy as np

def pixel_to_point(pixel_coord, depth, P_rect, R_rect, Tr_velo_to_cam):
"""
Convert a pixel coordinate to a 3D point in the lidar coordinate system.

Args:
pixel_coord: A tuple (u, v) representing the pixel coordinates.
depth: The depth value at the given pixel coordinate.
P_rect: The 3x4 camera matrix for the rectified image.
R_rect: The 3x3 rectification matrix for the camera.
Tr_velo_to_cam: The 4x4 transformation matrix from the lidar to the camera coordinate system.

Returns:
A 3D point in the lidar coordinate system, as a numpy array of shape (3,).
"""

# Convert pixel coordinates to camera coordinates
x, y = pixel_coord
u_rect = (x - P_rect[0, 2]) / P_rect[0, 0]
v_rect = (y - P_rect[1, 2]) / P_rect[1, 1]

# Apply rectification matrix to camera coordinates
point_cam = np.dot(R_rect, np.array([u_rect, v_rect, 1.0]))

# Convert to 3D homogeneous coordinates and multiply by depth
point_cam_hom = np.hstack((point_cam, 1.0))
point_cam_hom = depth * point_cam_hom

# Transform to lidar coordinate system
point_lidar_hom = np.dot(Tr_velo_to_cam, point_cam_hom)
point_lidar = point_lidar_hom[:3]

return point_lidar

You can convert pixel-to-point easily if you have three matrices. These matrices are each camera's intrinsic, extrinsic, and rectification matrix. These three matrices can be obtained through calibration.

--

--