How Are 3D Images Expressed? — About Various Aspects of 3D

Published in

Arbeon

11 min readJan 10, 2023

Hello! This is Layton.

I’m in charge of developing the 3D image processing technology in the Arbeon AR Team.😉

In this post, I’ll be going through what three-dimensional images are and how they’re expressed. And then, each of their features, pros and cons, and usages will follow. So, let’s get started right away.

How are 3D Images Expressed?
About the many forms of 3D

Recently, a variety of services have been born using 3D images, such as metaverse, AR, VR, Digital Twin, and many others. Looking up close, you will find out that each area is different from one another in its image. So the question is: How can we exactly define 3D images?

3D images refer to the method of expressing three-dimensional information, that is, the information that has width, length, and height. The typical digital photos show only one side of the object we see with our eyes and contain 2D information (width and length). On the other hand, 3D images must contain the three axes above.

And there are myriad ways to express 3D images. Among them, the four most common and widely used data structures are the depth map, point cloud, polygon mesh, and voxel. Now, let’s look at the definition of each 3D image and how it’s created, as well as its advantages and disadvantages and major applications.

1. Depth Map

Definition of Depth Map

A depth map is a data structure that expresses a 3D model by saving the depth value of each pixel in a 2D image. It shows the same form as the existing 2D picture, but instead of capturing the object’s colors by pixel, it captures the distance between the camera used for the photo and the objects.

Colored photos and their corresponding depth maps
(Source: Eigen and Fergus, “Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture”, ICCV 2015)

The images above are examples of estimating depth maps from colored images. The depth map is hard to see with our own eyes because of the nature of its expression method, which is that there isn’t any color information. Therefore, by and large, the nearest distance is expressed in blue, the middle one in green, and the farthest one in red, conveying the distance of the objects using color (other color expression methods exist, too).

How to Create a Depth Map

To create a depth map, we must accurately know the distance between the camera and the object of shooting, that is, the depth information. There are largely two methods to do so.

First is the direct measurement. A special light is emitted from the camera with a special sensor for depth measurement, and the physical distance is calculated based on the time the light takes to return after being reflected. Direct scanning is usually used in industrial sites and self-driving. But recently, iPhones after 12 Pro come with a light detection and ranging (LiDAR) camera.

The second is the indirect measurement. In this method, the ability of humans to estimate distances through a photo is copied by the computer algorithm. It’s one of the major research areas in the field of video processing. There are various methods, some of which acquire the distance from the color image of several photos obtained from various angles. Some others use AI technology to deduce the depth of a monochromatic image. It’s highly accessible because it only requires colored images. But this method shows less accuracy compared to the direct method, and its limitation is that only the relative distance could be found.

The Pros and Cons of Depth Map

Pros: It’s intuitive, and the post-processing and revising are easy. Since it expresses the 3D space information in 2D images, image processing techniques similar to general images could be utilized. Many types of researches are done on image-processing algorithms that touch up the accuracy of the depth by referring to the corresponding colored image when there’s one.

Cons: Expressing detailed 3D images is difficult with just a single depth map. Since it contains the distance of a single visible scene, the depth of the areas that are covered, like the back side of an object, cannot be acquired. Because of such limitation, some people call the depth map “2.5D.”

Uses of the Depth Map

Since the depth map offers distance information, it can be used in AR to float an augmented object in the desired location. And since it expresses the relative distance and closeness, it can be used in self-driving to identify near objects. Moreover, the depth map acquired in various aspects allows the application of a digital twin, which transfers the 3D real object into the virtual space.

2. Point Cloud

Definition of Point Cloud

The point cloud is one of the methods of directly expressing a 3D object in the computer’s virtual 3D space. Generally, it’s a combination of virtual points, and each point offers color information.

Point cloud image of a donut and point cloud render of a heritage site (Source: wikipedia)

The two images above are examples of point clouds. You can see that the points are formed into a shape of a donut like a cloud.

How to Create Point Clouds

The point cloud is created using the depth map. There are three stages behind how the point cloud is created in a depth map.

Stage 1: First, the depth map is acquired using a special camera or depth measurement technologies. Here, each pixel value is indicated as the distance from the corresponding camera.

Stage 2: Use the pinhole camera principle to calculate the location of the distance of each pixel in the actual 3D space. And the specifications of the special camera (focal distance, center pixel, distortion, and asymmetry) must be known for the actual calculation. Manufacturers provide the specifications for actual cameras, and for virtual cameras, algorithms are used to predict them.

Stage 3: Calculate the 3D coordinate of every pixel and mark everything as a point. And then, create a point cloud out of the 3D space points that correspond to all the pixels. In case there is a color information that corresponds to each pixel, then express it as the information of the corresponding point.

If we expand this principle a little more, we can compound the point clouds of the depth map that were acquired from various locations into one large point cloud. This is possible as long as we estimate the location and position of the shooting camera and reflect them as a coordinate in the point cloud. Once this happens, the point clouds from the depth map that were shot from various locations will compound together to create a more sophisticated point cloud.

Pros and Cons of Point Cloud

Pros: The structure is simple, so it’s easier to express a vast 3D space. While depth maps can only express the 3D information of one scene, point clouds combine the 3D information visible from various sides.

Cons: It’s hard to define the surface. Since the points in 3D space float all around randomly, it’s hard to judge whether the dislocated points are actually the ones that should exist on the surface. Moreover, it’s not very suitable for viewing because it requires an innumerable number of points to come up with a resolution that’s comfortable to the human eyes.

Uses of Point Cloud

Because of the pros and cons above, the point cloud is mostly used to analyze the characteristics of an actual 3D space instead of being used for human viewing, such as self-driving, measuring land, digital twin, etc. Since it’s not meant for humans to see them, there’s no need to create a smooth surface. And it’s advantageous to analyze the characteristics of a shape by getting rid of the noise in the point cloud stage. Similarly, it’s used as a middle stage that defines the object’s shape in the process of creating a mesh for a real-life object.

3. Polygon Mesh

Definition of Polygon Mesh

A polygon mesh uses polygons to show the sides of a 3D object. Below is an example of a basic polygon mesh.

Image examples of polygon mesh (Source: learn.foundry.com, 123RF)

You can see in the example images above that the kettle is composed of big and small rectangles, while the ball is composed of numerous small triangles. And just like them, most of the mesh is expressed in rectangles or triangles. The process of floating the polygon mesh on the screen is called rendering. Generally, the GPU calculates what kind of image would come out when an object is shot by the virtual camera with a lighting, and shows the resulting image.

How to Create Polygon Meshes

Largely speaking, there are two ways. One way is to create from the actual point cloud, and the other is to create from designing.

In the case of creating from the point cloud, the spatial characteristics of the point cloud are analyzed to define the surface first, and the points near one another are connected in the defined surface to form triangles. The algorithms that represent this method include the Delaunay triangulation, ball-pivoting algorithm, and Poisson surface reconstruction.

In the case of creating via designs, most meshes are created by using software meant for such purpose. The software is further divided according to its purposes. For CAD, there are AutoCAD by Autodesk, CATIA by Dassault Systèmes, and many more. For graphics design, you have Blender (an open-source software), and Maya and 3DSMax by Autodesk.

Pros and Cons of Polygon Mesh

Pros: Its biggest pro is the expression level compared to the computation amount. Since it expresses a surface in a polygon, it gives this real and three-dimensional feeling. Moreover, it’s possible to express in various and abundant ways because of the low computation amount. Not only that, it’s possible to portray an actual object more realistically by plastering color information (texture) to the surface.

Cons: The biggest con is the difficulty of creating a mesh based on the object in real life. In order to transfer the object from real life to the virtual space, you must locate a smooth surface and the points that compose it. And since it portrays the surface only, it becomes a bit problematic when an object’s inside is empty or has a cut area or a hole, as the object is hollow and the other side would become visible.

But in computer engineering (especially graphics and video processing), various researches are being done to solve these downsides and to create a more high-quality mesh. These researches include mesh reconstruction, back-face culling, ray tracing, and many others. Technologies that compensate for the shortcomings are underway, and some are being applied in the present day.

Uses of Polygon Mesh

Since it only requires points limited to a surface and polygon information to come up with a smooth exterior, this method is usually used in games or animations that express 3D models. In addition, it’s also used when a more detailed 3D design is needed, such as in CAD modeling.

A car mesh and a house made of CAD. (Source: Pixabay)

4. Voxel

Definition of Voxel

A voxel refers to the unit of expressing 3D information in the form of a cube. It does more than expressing the surface unlike the point clouds or polygon meshes. Instead, it shows the entire structure, even the insides. You could think of it as a type of a 3D mosaic structure. The best example of this is Minecraft, and in real life, there are CT, MRI, and ultrasound.

Examples of voxels, Voxel building created in Minecraft and image of CT-scanned wrist (Source: wikipeda, Flickr)

How to Create Voxels

The virtual voxel defines a cube like in Minecraft, and a structure can be made by stacking the voxels up in any way a creator wishes to make in the virtual space. In this case, it’d be like freely stacking up blocks.

In real life, the voxel is mostly acquired through volume imaging. The volume image makes use of the method called “tomography,” and it’s commonly seen in the medical field as CT and MRI. It’s done by shooting various images, which were shot as a single surface of 3D space, and combining them together. It’d be easier to understand if you think of each image pixel as a voxel, and think of stacking it up like blocks. In the medical field, the tomogram is obtained by sending permeable rays like X-rays or ultrasonic waves to different directions and using the reflected information (CT and ultrasound). Or, a strong magnetic wave is used to see the reaction of the tissues in the body (MRI).

Example of tomogram (Source: wikipedia)

Pros and Cons of Voxel

Pros: It’s way more detailed than the point cloud or mesh. Unlike the mesh that expresses only the surfaces, voxel allows even the insides to be expressed, so it’s more detailed. Not only that, adjusting the color and transparency level of the inner voxel allows one to observe both the surface and the internal structure to the desired level.

Cons: The biggest con of voxel is the amount of data. Whereas other methods of expression demand only the information on the surface, voxels need to have data for the insides as well. As such, expressing even a small object requires an incomparably larger set of data compared to others. This is why voxel is used when it’s absolutely necessary to describe the inside in detail rather than in general situations.

Uses of Voxel

The most typical fields are games and medicine. In games, voxels can be used in every area just like in Minecraft, and it’s also common to express special objects like smoke with voxels because they ensure that the 3D spaces are detailed and lifelike. However, they are not widely used because of the efficiency. The computation amount is just too large. In the case of medicine, they are used to combine the tomographic images (CT, MRI, and ultrasound) as 3D information to analyze the internal 3D information of an organism.

Wrapping Up

In this blog post, we looked at the various forms of 3D and how they’re created and used. 3D images, which seem difficult to comprehend, are actually being used in our daily lives more than we can imagine. So when you read or watch news about 3D-related services down the road, I humbly suggest that you closely observe which of the four methods discussed here is being used as the main method of 3D image. It would make you will feel closer to them. :)