Decoding NeRFs: Navigating the Future of 3D Modeling and Synthesis

By Gabriela Padilla

Gabriela Padilla
Insights of Nature
9 min readJan 28, 2024

--

In the exploration of the 3D modeling domain and the metaverse, there is a growing emphasis on replicating our world with a high degree of realism. In addition, some things become clearer when we see them in 3D. It’s like gaining a different viewpoint that helps us understand stuff in a new way.

One approach to digitizing real-world objects involves the application of Neural Radiance Fields (NeRFs).

Breaking Down NeRFs

A neural radiance field (NeRF) is a fully connected neural network designed to generate novel views of intricate 3D scenes based on a partial set of 2D images. The training process involves using a rendering loss to reproduce input views of a scene, effectively interpolating between them to create a comprehensive scene. NeRF proves to be a highly efficient method for synthesizing images within the realm of synthetic data generation.

Let’s break down the term “Neural Radiance Field

  • “Neural”: the use of a multilayer perceptron (MLP), an older neural network architecture, for image representation.
  • “Radiance”: indicates that this neural network models the brightness and color of light rays from various perspectives.
  • “Field”: a mathematical term denoting a model that transforms diverse inputs into outputs using a specific structure.

What sets NeRFs apart is their distinct approach — unlike other deep learning techniques, NeRFs train a single fully connected neural network using a series of images specific to one object.

This network is then employed to generate new views of that particular object. In contrast, conventional deep learning typically begins with labelled data to train neural networks that can provide generalized responses across similar types of data.

Why NeRFs are Important in Real-Life

Healthcare

NeRF has the potential to play a crucial role in medical applications by enabling object segmentation and 3D reconstruction in medical imaging. It can automate the segmentation of organs and organ-based structures, such as the brain, heart, and lungs, from medical images. This could contribute significantly to diagnosis and treatment planning.

3D reconstructions of the brain from MRI scans using NeRFs

Computer Vision

In the domain of computer vision, NeRF holds promise for image synthesis, object detection, and image segmentation. It can excel in generating realistic-looking images from low-resolution inputs, detecting objects, and segmenting images, potentially facilitating the detection and classification of objects in visual data.

Robotics

NeRF shows promise for application in robotics, particularly in autonomous navigation and obstacle avoidance. It has the potential to generate 3D maps of the environment and identify obstacles, offering assistance to robots in navigating and circumventing obstacles in their path.

Entertainment

Within the entertainment industry, NeRF can contribute to the creation of realistic images for movies, television shows, and video games. It can also assist in generating 3D models of objects and scenes, enhancing the development of realistic virtual worlds in video games.

Satellite Imagery and Planning

NeRFs have the potential to leverage a variety of satellite images to produce comprehensive models of the Earth’s surface. This could prove useful in Reality Capture (RC) scenarios, where real-world environments need digitization. NeRF has the potential to transform spatial location data into highly detailed 3D models. For instance, reconstructing aerial images into landscape renderings could potentially be commonly used in urban planning as a valuable reference for real-world area design.

Food Industry

Neural Radiance Fields (NeRFs) offer innovative applications that enhance various aspects of operations and customer engagement. Restaurants can use NeRFs to create immersive 3D representations of menu items for online platforms, providing customers with visually appealing and interactive menu experiences. Additionally, quality control processes can benefit from NeRFs by enabling detailed visual inspections of food products, ensuring consistency and identifying any imperfections.

Decoding NeRFs: An Explanation of the Functioning Process

Imagine you’re an artist creating a detailed sculpture of a complex scene. However, you can only see this scene from a few specific viewpoints, and you want to capture its essence from every angle. This is where Neural Radiance Fields (NeRFs) come into play.

A Neural Radiance Field (NeRF) employs a sparse set of input views to optimize a continuous volumetric scene function, resulting in the capability to generate novel views of a complex scene. Input for NeRF can be provided as a static set of images.

The continuous scene is characterized as a 5D vector-valued function with the following features:

  • Input parameters include a 3D location (x, y, z) and a 2D viewing direction (θ, Φ).
  • Output consists of an emitted color (r, g, b) and volume density (α).

The optimization process involves a deep fully connected neural network, commonly known as a multilayer perceptron (MLP), devoid of convolutional layers. The function representation within this network involves regression from a single 5D coordinate (x, y, z, θ, φ) to a volume density and view-dependent RGB color.

Think of NeRF as your artistic tool, taking inspiration from a sparse set of images you’ve taken of the scene. The tool is like a magical sculpting pen that, when moved through the scene, not only carves out the intricate details but also captures the colors and shadows from every perspective. Just like an artist creating a sculpture, you provide the tool with a static set of images, and it learns to intricately carve out the scene’s features.

Now, picture the scene as a magical 5D canvas with each point having coordinates (x, y, z, θ, φ). The artist (NeRF) is equipped with a special paintbrush (neural network) that, for each point on the canvas, precisely mixes colors and densities based on its location and viewing direction. This paintbrush has no convolutional layers; instead, it’s like a multilayer perceptron (MLP), working its magic to represent the scene in a continuous, fluid manner.

Rendering the Neural Radiance Field (NeRF) from a specific viewpoint follows a sequence of steps:
1)
Marching camera rays through the scene to generate a sampled set of 3D points.
2) Utilizing these points and their corresponding 2D viewing directions as inputs to the neural network to generate an output set of colors and densities.
3) Applying classical volume rendering techniques to accumulate these colors and densities into a 2D image.

To bring this magical canvas to life, you follow a process. You march rays through the scene, collecting 3D points. These points, along with their corresponding 2D viewing directions, are fed into the artist’s paintbrush, resulting in an output of colors and densities. Finally, classical rendering techniques act as a masterstroke, blending these colors and densities into a 2D image — your masterpiece.

This optimization, spanning multiple views, facilitates the network in predicting a coherent scene model by attributing high volume densities and accurate colors to locations containing the genuine underlying scene content.

Just as an artist refines their sculpture, NeRF undergoes optimization through a unique artistic process. By minimizing the difference between observed images and views generated by the tool, the artist (NeRF) learns to predict a cohesive representation of the scene. This iterative process ensures that the final artwork depicts the true essence of the complex, multidimensional scene from various viewpoints. It’s like sculpting with a magical pen, creating a captivating masterpiece from limited perspectives.

Research on NeRF Performance Improvement

One significant challenge within Neural Radiance Fields (NeRFs) pertains to their computational demands. The model is computationally expensive, requiring substantial amounts of data for training and leading to time-intensive training processes. Additionally, the model’s complexity is compounded by a considerable number of parameters and layers, posing implementation challenges.

Since 2020, there has been notable research dedicated to enhancing the performance of Neural Radiance Fields. Researchers are exploring various strategies to address the computational costs associated with training, aiming to streamline the model’s requirements while preserving or enhancing its effectiveness.

KiloNeRF: Accelerating Neural Radiance Fields with Numerous Small MLPs

In the context of KiloNeRF, the scene representation diverges from the conventional approach of employing a single, high-capacity MLP. Instead, the scene is characterized by thousands of small MLPs. This modification enables rendering speeds over 2548 times faster, all while maintaining visual quality.

Network Architecture

A downscaled version of NeRF’s fully-connected architecture is adopted, similarly enforcing independence of predicted density from the view direction. While NeRF incorporates 10 hidden layers, each outputting a 256-dimensional feature vector (with the last hidden layer outputting a 128-dimensional feature vector), the chosen model consists of only 4 hidden layers, each with 32 hidden units.

Consider NeRF’s architecture as a complex recipe for baking a cake. In the original NeRF recipe, it’s like having ten intricate layers, each contributing a unique flavor to the final cake. Each layer represents a hidden layer in the neural network, and the output of the recipe becomes more refined as you progress through these layers.

Now, envision the downscaled version of NeRF’s architecture as a simplified recipe. Instead of ten complex layers, this recipe only has four layers, making it more accessible and quicker to execute. Each layer is simpler, with only 32 specific ingredients (hidden units) compared to the richness of the original ten-layered recipe.

Similar to NeRF, the viewing direction is provided as an additional input to the last hidden layer. Affine layers are followed by ReLU activation, with two exceptions: the output layer computing RGB color utilizes a sigmoid activation, and no activation is applied to the feature vector input to the penultimate hidden layer.

Just like in the original NeRF cake, both recipes maintain a key principle: ensuring that the taste of the cake (predicted density) remains consistent, regardless of the perspective you view it from. In this analogy, the viewing direction is akin to the specific way you observe or slice the cake — the recipe takes that into account to ensure a consistent taste experience.

In the cooking process, the activation functions, like adding seasoning to each layer, ensure that the overall flavor (representation) is enhanced and well-balanced. However, there are a couple of unique twists in the downscaled recipe: the very last step, responsible for determining the cake’s color (RGB), uses a special ingredient (sigmoid activation) to give it a distinctive appearance.

Nerfstudio: Simplifying NeRF Creation

Nerfstudio introduces a user-friendly API that streamlines the entire process of creating, training, and testing NeRFs. The library emphasizes a more interpretable implementation of NeRFs by breaking down each component into modular units. This modular approach aims to enhance the user experience and accessibility when exploring this technology. The initiative was initiated in 2022 by students at the Berkley Artificial Intelligence Research Center and is supported by sponsorship from Luma AI and BAIR.

Key Methods Included

  • Nerfacto: A recommended method that consolidates multiple approaches into one.
  • Instant-NGP: Instant Neural Graphics Primitives with Multiresolution Hash Encoding.
  • NeRF: Original Neural Radiance Fields.
  • Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.
  • TensoRF: Tensorial Radiance Fields.
  • Splatfacto: Nerfstudio’s implementation of Gaussian Splatting.

Future Developments

Looking ahead, we can anticipate ongoing advancements in NeRFs and related technologies. Efforts are being made to enhance NeRFs’ performance and address potential challenges. As the development of this technology progresses, there is a likelihood of incorporating NeRFs alongside 3D GANs or VAEs to generate realistic 3D models applicable in various industries, including video games, metaverse development, and immersive technologies like VR, AR, and XR.

Personal Projects in the Future

I am currently engaged in developing a Neural Radiance Field (NeRF) to extract a colored mesh from an object. Furthermore, I am delving into the potential impacts of NeRFs on enhancing digital twins and contributing to educational advancements. Additionally, I am exploring the possibility of integrating 3D reconstruction with olfaction.

Thank you for reading this! If you want to see more of my work, connect with me on LinkedIn!

--

--