Converting 2D image coordinates to 3D coordinates using ROS + Intel Realsense D435/Kinect

2 min readMay 3, 2020

Japanese version: https://ych.hatenablog.com/entry/2020/05/01/012518

Objective

I want to recognize an object with RGB image of RGB-D camera and get the 3D coordinates of the object.

Problem

At first, I was thinking that this isn’t a hard task to solve by using math. However, to increase accuracy, I need to take into account distortions, FOV, etc.., and it will be a painful task.

Solution

Using rs2_deproject_pixel_to_point function provided by liberalsense. (https://github.com/IntelRealSense/librealsense/blob/master/include/librealsense2/rsutil.h#L67)

This function converting 2D coordinate to 3D coordinate take into account information provided by sensor_msgs/CameraInfo.

In this article, I am using Python. (I usepyrealsense2 installed bypip )

rs2_deproject_pixel_to_point takes 3 arguments. (intrinsic, pixel, depth)

intrinsic

All values required by pyrealsense2.intrinsics can be obtained from sensor_msgs/CameraInfo .

_intrinsics = pyrealsense2.intrinsics()
_intrinsics.width = cameraInfo.width
_intrinsics.height = cameraInfo.height
_intrinsics.ppx = cameraInfo.K[2]
_intrinsics.ppy = cameraInfo.K[5]
_intrinsics.fx = cameraInfo.K[0]
_intrinsics.fy = cameraInfo.K[4]
#_intrinsics.model = cameraInfo.distortion_model
_intrinsics.model  = pyrealsense2.distortion.none     
_intrinsics.coeffs = [i for i in cameraInfo.D]

This article and this article will help you understand this code.

pixel

2D coordinate in the 2D RGB image. The coordinate system is defined here. The left up is (0,0), right is x , down is y .

depth

Depth value obtained from the depth image.

Return value

The 3D coordinates (float[3]). The coordinate system is also defined here. If you want to use these values in rviz, you need to change the coordinate.

result = rs2_deproject_pixel_to_point()
x = result[2]
y = -result[0]
z = -resilt[1]

The complete code is

def convert_depth_to_phys_coord_using_realsense(x, y, depth, cameraInfo):  _intrinsics = pyrealsense2.intrinsics()
  _intrinsics.width = cameraInfo.width
  _intrinsics.height = cameraInfo.height
  _intrinsics.ppx = cameraInfo.K[2]
  _intrinsics.ppy = cameraInfo.K[5]
  _intrinsics.fx = cameraInfo.K[0]
  _intrinsics.fy = cameraInfo.K[4]
  #_intrinsics.model = cameraInfo.distortion_model
  _intrinsics.model  = pyrealsense2.distortion.none  _intrinsics.coeffs = [i for i in cameraInfo.D]  result = pyrealsense2.rs2_deproject_pixel_to_point(_intrinsics, [x, y], depth)  #result[0]: right, result[1]: down, result[2]: forward
  return result[2], -result[0], -result[1]