Exploring the Foundations of Virtual Reality: 3D Modeling and Immersion Techniques.

Mohamed Mellouky
Antaeus AR
Published in
6 min readOct 16, 2023
Mark Zuckerberg: First Interview in the Metaverse | Lex Fridman Podcast #398

A few days ago, MIT computer scientist and researcher Lex Fridman published a podcast with Mark Zuckerburg on the metaverse. In this podcast, Mark presented the progress of META in the Metaverse. In fact, they produced the podcast remotely, but it appears to be an on-site podcast using the Oculus 3 headset, which renders photorealistic Kodak avatars.
Creating such a 3D virtual environment requires a great deal of work in terms of 3D modeling and content generation. In addition, the environment presented in the podcast contains virtual humans. We also need to create conversational agents. These are able to detect facial expressions and real body gestures and imitate them in the virtual world. In this article, we’ll discuss how we can create 3D content so that it can be rendered and deployed in a VR headset.

As we address virtual reality, we’ll start with a brief introduction to VR. We’ll then move on to 3D modeling and reconstruction, which is the building block of virtual reality.

We, humans and other species, use our senses to perceive the world. Then, the brain receives and processes the signals to give them meaning. Finally, it generates a reaction. We can therefore conclude that our bodies are physically immersed in a world, that we perceive it and act upon it. So, to create a virtual reality, we need to feed our senses with virtual data. So they send virtual signals to the brain. However, the brain cannot — in theory — distinguish between virtual and real signals, so it acts according to what it receives.

When it comes to virtual reality, the main sense to work with is sight. That’s why we’ve built a dual-lens headset that puts our eyes directly in front of the virtual reality (virtual scenes) we’re generating. However, other senses are also involved. Indeed, if we consider only the data received by the eyes, we cannot conclude that our body is totally immerged in the virtual world. In fact, our ears still perceive the word in which we live, and transmit this information to the brain. In this case, our brain is left with two different signals. One from the eyes, saying something about the virtual world, and one from the ears, saying something about the real, physical world. He concludes that something is wrong with our senses, causing illness or what we call “cybersickness”.

As we perceive the world in three dimensions ( excluding time, which is another area of virtual/augmented reality research), we need to develop three-dimensional scenes to deploy on the VR headset. There are a number of techniques for doing this. Firstly, we can create 3D scenes from real-world scenes. This means scanning a real environment using a device such as a 360° camera. However, the scanned data can be a simple 360° panorama or a cloud point. The first solution has been adopted by Google, which has developed Google CardBoard. This application allows users to take images of an environment from four sides. An -Android- application reassembles these images to generate a 360° panorama, which can then be viewed using Google CardBoard. However, this solution was very limited, as the 360° panorama doesn’t allow you to see the view from above and below.

CNET via You Tube

Another solution is to use 360° cameras, which produce 360° images/videos that can be viewed through a VR headset. However, this solution is also restricted by the fact that 360° images are often static, meaning that the user can move his or her head from left to right, up and down, but cannot move forwards or backwards. This is an extremely significant limitation in VR applications, as it reduces immersion. Furthermore, 360° videos don’t allow full immersion either. As videos are pre-screened, the user cannot move freely in the 3D environment.

The latter — the cloud point — builds on the idea of getting a cloud point from a 3D scanning. We then apply an algorithm to convert these points — which represent a 2D space — into a 3D representation of space. So we need at least two scans to create the illusion of depth. One of the issues with this method is that when we have a high number of data points, reconstruction becomes very computationally expensive. For this reason, there are many methods and algorithms that attempt to reduce data points without modifying the structure of the original image. These algorithms are not covered in this article, as they fall outside my area of competence.

Both of these 3D reconstruction methods lack the flexibility to produce 3D animations. That’s why we use 3D modeling software that lets you model 3D objects and animate them. For example, we can use Blender, which is free and open source. However, 3D modeling software such as 3DMax, Maya and Cinema4D are also available. The downside here is that building realistic 3D scenes will depend entirely on the skills of the 3D modeler. As far as virtual reality is concerned, there are already VR applications that let you model 3D objects in a virtual environment using a VR headset. Gravity Sketch is just one example.

Gravity sketch via You Tube

However, studies have shown that VR modeling software lacks precision in terms of tools and functionality, and is not necessarily comfortable for the designer. An experiment carried out on a group of students showed that many of them found it difficult to model sketches in a 3D space, that they were often lost in the 3D space and that they were not necessarily comfortable with the 6 DOFs. Furthermore, modeling in a 3D space is cognitively completely new to us, so our brains have a hard time adapting to such an environment. However, this group of students found a VR modeling application helpful in the visualization process.

The 3D modeling and reconstruction techniques mentioned above are mesh-based. This means that they construct 3D objects using a set of vertices and edges, and that the combination of the resulting facets produces the final rendered object.
There are, however, other types of modeling. In particular, 3D objects can be modeled using primitive shapes. This type of modeling is based on the idea of combining several primitive shapes and applying boolean operations to them to build a final sketch. However, in many cases, this type of modeling cannot create real-looking 3D objects, as the primitive shapes it uses are limited to : Cube, Sphere, Cylinder, Tube, Torus, Cone. As you can imagine, we can’t create complex shapes from these primitives.

The use of meshes consists in constructing objects from a set of polygons, in particular triangles. A building block of a 3D model consists of a triangle constructed from three points we call vertices, a connection between these points, edges, and each of the three vertices form a facet. The normal vector on the facet determines whether or not the facet will be visible when the image is rendered.

source : Wikipedia
source : AutoDesk Help

However, this technique, also known as polygon mesh, only represents the surface of an object. If we also want to represent volume, we can use volumetric mesh.

In fact, this article is not exaustive and does not mention all 3D modeling techniques, but we have mentioned the building blocks. The other techniques are essentially optimizations of those mentioned above.

--

--

Mohamed Mellouky
Antaeus AR

Etudiant à l'UBO. Systèmes Intelligents, Interactifs et Autonomes