RGBD to 3D Point Cloud

Ramazanilkera
5 min readAug 3, 2023

--

RGBD

In the context of image processing and computer vision, RGBD refers to a data format that combines both color and depth information for an image.

The acronym RGB stands for the three primary color channels, and each channel is represented by an integer value ranging from 0 to 255. These values determine the intensity of the corresponding color, with 0 indicating the absence of color and 255 representing the maximum intensity. The first value represents the red color channel, the second value represents green, and the third value represents blue. When any of the RGB values increase, the color feature converges towards that particular color channel. To illustrate this, consider the following examples.

(RGB)           -> color

(0, 0, 0) -> black
(255, 0, 0) -> red
(0, 255, 0) -> green
(0, 0, 255) -> blue
(255, 255, 255) -> white

The last letter of the acronym RGBD stands for the depth channel. The value provides the distance information of an image. The depth value can be obtained with various technologies, such as time-of-flight (ToF) sensors, structured light sensors, or stereo camera setups. These sensors calculate the depth value by measuring the time taken for light or electromagnetic waves to travel from the sensor to the object and back.

3D Point Cloud

A 3D point cloud is a fundamental data structure comprising a set of data points in a three-dimensional Cartesian coordinate system, where each individual point within the cloud corresponds to a unique position in 3D space, precisely defined by its x, y, and z coordinates. Point clouds serve as crucial entities in diverse software industries, encompassing computer vision, robotics, geographic information systems (GIS), and 3D modeling.

Creating 3D Point Cloud from 2D images

To create a 3D point cloud from 2D images, the knowledge of focal length and principal points is essential. For this purpose, a camera matrix is given as input.

The following block shows the camera matrix of the standard pinhole camera.

                    | fx   0  cx |
Camera Matrix = | 0 fy cy |
| 0 0 1 |


fx: represents the focal length along the x-axis.
fy: represents the focal length along the y-axis.
cx: represents the x-coordinate of the principal point,
which is the center of the image plane along the x-axis.
cy: represents the y-coordinate of the principal point,
which is the center of the image plane along the y-axis.

Note that,
-The bottom-right element represents the scale factor and is usually set to 1.

-The other elements suggest that the camera's perspective projection
does not involve any distortion or slanting effect between the horizontal (x)
and vertical (y) axes.

When provided with an RGBD two-dimensional image and the camera matrix of a standard pinhole camera, the construction of a three-dimensional point cloud can be achieved using fundamental mathematical principles.

Figure 1

According to Figure 1, ‘f’(focal length) is given by the camera matrix, ‘y’(y coordinate of the pixel) can be calculated by enumerating the pixels, and ‘z’(depth) is also given by the depth values. ‘Y’ have to be calculated in order to create the 3D point cloud. By using triangle similarity, ‘Y’ is equal to y*z/f’. This explanation is only for ‘y’ coordinates, ‘x’ and ‘y’ coordinates can be obtained with the same formula.

The following function implements the above step.

def to_3D(fx, fy, depth, cx, cy, u, v):
x = (u-cx)*depth/fx
y = (v-cy)*depth/fy
z = depth
x = np.expand_dims(x, axis = -1)
y = np.expand_dims(y, axis = -1)
z = np.expand_dims(z, axis = -1)
return np.concatenate((x,y,z), axis=-1)

In the function, the ‘u’ and ‘v’ parameters represent the ‘xy’ coordinate of the pixels. To illustrate these parameters, consider the following block. The parameters in the block are created by using an image that has 640-pixel columns and 480-pixel rows.

the shape of u = (480, 640)

the shape of v = (480, 640)

u =

[[ 0 1 2 ... 637 638 639]
[ 0 1 2 ... 637 638 639]
[ 0 1 2 ... 637 638 639]
...
[ 0 1 2 ... 637 638 639]
[ 0 1 2 ... 637 638 639]
[ 0 1 2 ... 637 638 639]]

v =

[[ 0 0 0 ... 0 0 0]
[ 1 1 1 ... 1 1 1]
[ 2 2 2 ... 2 2 2]
...
[477 477 477 ... 477 477 477]
[478 478 478 ... 478 478 478]
[479 479 479 ... 479 479 479]]

Finally, the 3D point cloud can be created by using the following function.

def make_point_cloud(datapath, fx, fy, cx, cy):
rgbd = .....\load data
H = len(rgbd)
W = len(rgbd[0])
u = np.arange(W)
v = np.arange(H)
u, v = np.meshgrid(u, v)
xyz = to_3D(fx, fy, rgbd[:,:,3], cx, cy, u, v)
rgb = rgbd[:,:,:-1]
point_cloud = np.concatenate((xyz, rgb), axis=-1)
return point_cloud

The input RGB image:

Figure 2

The input Grayscale(contains depth values) image:

Figure 3

The two screenshots from the three-dimension object that is created by using a point cloud.

Figure 4
Figure 5

--

--