# INSIDE THE LAB: artlabs’ Guide to AI-based 3D Content Generation

Hi! It has been a while since the latest post of ‘Inside the Lab’, our research and engineering blog. This week’s topic is how 3D content is represented and handled by AI methodologies, how AI utilizes these representations for 3D content creation, as well as the pros & cons of these techniques.

Machine learning models are trained using various 3D content representations such as voxels, point clouds, signed distance fields, neural radiance fields (NeRF), polygonal meshes… We will talk about voxel, point cloud, NeRF, and polygon representations in this post. Let’s go over these, one by one.

# Voxels

You know about picture elements (a.k.a. pixels) but have you ever heard about volume elements (a.k.a voxels)? Now you did! Pixels are represented as red, green, and blue color intensity values with an additional opacity value between 0 and 255 on a 2D grid represented by x and y coordinates. Voxels, similarly consist of red, green, blue, and opacity values on a 3D grid. AI models aim to learn these 4 values for each voxel to efficiently represent the scene.

Machine Learning models such as 3D-R2N2 (2016), Pix2Vox/++ (2019/2020), and EVoIT (2021) take advantage of the voxel representation’s simplicity and utilize multi-view images of an object to reconstruct that object as a voxel grid.

1. Simplest possible representation for 3D content. The representation is simply red, green, blue, and opacity values for each cube within the grid.
2. Easy to build a machine learning model upon. As the representation is simple, it is easier for the model to learn about.

1. The resolution, and that’s a big one! A voxel grid with an edge length of 512 actually contains more than 134 million data points, while a 4096x4096 image only contains just above 16 million, and we know how large a 4K image’s file size can be. There are many methods to compress voxel files to reduce their file size. However, the current machine learning models still struggle to keep up with this much information. Of course, one can overcome this limitation by not storing the empty voxels in memory or going further and compressing more. But these compressions cannot be applied during machine learning as the compressions can be done once a voxel grid is constructed, and what is being done with machine learning is still “construction”.
2. Rendering. GPUs are optimized for rendering polygons, and there is no specific hardware to efficiently render high-resolution voxels.

## Possible Industry Use Cases

Voxels are hella good if you want to represent cubic shapes. As there is pixel art, there is also 3D art based on voxels. Furthermore, who doesn’t want to generate Minecraft-like worlds?! Metaverses like Sandbox also utilize voxel representations, and AI-based voxel creation can help improve them as well.

# Point Clouds

Well, you guessed it: Point clouds are clouds formed by colored points in 3D space. Unlike voxels, they are not contained within a grid, so you can represent a wider range of objects better with point clouds. However, since there is no grid, you need also to consider each point’s position in the 3D space. This means you need to keep more data compared to voxels for each data point.

Models such as OpenAI’s Point-E (2022) have demonstrated success in point cloud-based 3D content creation. However, as with everything good in the world, point clouds have their advantages and disadvantages.

## Advantages of point cloud representation

1. Better handling of fine details as compared to voxels. Point clouds can be dense or sparse. When sparse, it is easy to miss most details, but when a point cloud is dense, one can very well represent the original object/shape.
2. Great at representing large scenes! LiDARs are a great way of acquiring point clouds, and they are widely used in smart vehicles. There are several examples of a drone scanning a whole area, including forests, factories, stadiums, city squares, etc. They even created a point cloud of the whole city of Düsseldorf!

## Disadvantages of point cloud representation

1. No volume. Even when a point cloud is dense, it is constructed of points, and points cannot represent a volume.
2. Rendering. Since there are no polygons on a point cloud, current GPUs cannot render point clouds at all. They cannot be utilized for manufacturing either. Point clouds can be converted to polygonal meshes, however, the current algorithms reconstruct lumpy outputs.

## Possible Industry Use Cases

Point clouds are actually used widely in several industries. They can be acquired by LiDARs installed on drones or smart cars. One can create point cloud objects and environments with AI to be utilized within simulations to improve the algorithms that are being run for better driverless vehicles. Furthermore, they are also used in medical imaging. AI-based creation of medical point clouds can improve disease and physical trauma detection in patients as well.

Given a set of images and corresponding camera pose information, a NeRF can reconstruct a 3D scene by finding out where each pixel on an image corresponds to in the 3D space. Once the scene is reconstructed, a NeRF can provide a full 3D view of a scene, even from unseen angles. Furthermore, the representation itself is AI! Basically, it is a neural network that contains the whole information required to render a 3D scene. The scene is represented within the neural network and when queried with a new camera pose, the neural network can respond with a new render of that view. While the original NeRF network had to be trained for hours (days on some occasions), several novel NeRF variants can reconstruct a high-quality 3D scene within mere seconds. Read more about NeRFs in one of our earlier blog posts.

1. The scene is represented as we perceive it with our camera, and we can see it through previously unseen angles. It can easily be said that you can retrieve fine details with NeRFs.
2. Rendering. The model’s whole purpose is to render a scene from a new viewing angle.

1. No volume. 3D scenes reconstructed by a Neural Radiance Field is actually a render. Hence, they cannot be utilized for physics simulations, manufacturing, etc.
2. They are a reconstruction of a 3D scene, but they do not allow scene editing. There are methods to separate an object from the background but still, you cannot yet place a NeRF within another NeRF as you can do with polygons, voxels, or point clouds.

## Possible Industry Use Cases

Neural Radiance Fields can render scenes from any angle, and they can potentially be used widely by cinematic arts. It is widely known that camera angle and motion are very important in cinematography, and NeRFs can create renders from angles a camera person might have trouble with.

# Polygonal Mesh

Polygonal meshes consist of points (namely, vertices), lines that connect these points to each other (namely, edges), and polygons that are constructed in between these edges. Vertices are represented by their coordinates; edges are represented by which vertices they are connecting, and polygons are represented by which edges they are being constructed on. Furthermore, there are multiple ways of representing color on meshes ranging from simply coloring each vertex with red, green, and blue intensity values to deciding how that color will interact with any given light by providing material properties such as diffusion, specularity, opacity, refractive index, surface normals, etc.

Methods such as NVDiffrec-MC (2022) can infer a mesh, light, and material triplet by utilizing image sets. Lately, many more methods have been developed to reconstruct meshes and textures from text or image inputs: GET3D, DreamFusion, Score Jacobian Chaining, Magic3D

## Advantages of Polygonal Mesh Representation

1. GPU hardware is optimized for polygonal representations, hence polygonal meshes are the easiest to render and visualize. They are widely utilized for gaming, CGI, VFX, AR/VR… You name it!
2. Designers can play with different mesh and material parameters to create very unique designs with very fine details.
3. The level of detail can be controlled easily by changing the count of vertices and polygons.
4. There are very advanced tools for mesh editing, and nowadays, meshes can be modified relatively easily.

## Disadvantages of Polygonal Mesh Representation

1. The structure is complex. For AI models to create meshes, the neural models need to be able to generate vertices, edges, polygons, materials, and colors.
2. Design and creation of meshes from scratch without AI are especially time-consuming and very difficult to handle at scale.

## Possible Industry Use Cases

Polygonal meshes are already utilized in gaming, cinematic arts, Web3, and XR. Many industries like e-commerce highly benefit from polygonal meshes by visualizing their products in 3D. By creation of content with AI, all of these industries can generate content at scale and awe their audience.

At artlabs, we utilize all these representations and AI at different sections of our pipeline. See more of how artlabs utilizes AI to create content at scale in these blog posts:

Thanks for reading! See you in the next post of “Inside the Lab” 👋🏻

Author: Doğancan Kebude, R&D Lead at artlabs

--

--

Scaling 3D with AI and Enabling AR Experiences for Brands and Creators