NeRF : Machine learning model to generate and render 3D models from multiple viewpoint images

Takehiko TERADA
axinc-ai
Published in
4 min readJan 10, 2023

This is an introduction to「NeRF」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Oveview

NeRF (Neural Radiance Fields) is a technology released in March 2020. nerF can generate a 3D model from multiple viewpoint images and render any viewpoint video. neRF does not use polygons, but instead uses machine learning models to represent 3D model representation.

Source: https://github.com/bmild/nerf

Architecture

The 3D model generated by NeRF is represented as a parameter of the machine learning model; one machine learning model is generated per 3D model, which is a functional representation of the 3D model.

In training, a machine learning model is trained using images from multiple viewpoints and the camera parameters of each image as input.

Source: https://github.com/bmild/nerf

The inference outputs (R,G,B,σ) by giving the coordinates (x,y,z) on the space you want to generate and the viewing direction (θ,φ). σ is the density. As in ray tracing, the RGB values are calculated pixel by pixel.

The model architecture is based on a 90-element 1D input vector, which is a projection of (x,y,z) and (θ,φ) into a higher dimensional space by PositionEmbedding, with a series of Linear operations.

https://netron.app/?url=https://storage.googleapis.com/ailia-models/nerf/nerf.opt.onnx.prototxt

The data for training is input in LLFF format, which includes an image file and a numpy file showing the camera position.

When attempting to build a 3D model from video, it is necessary to add camera position annotations to the video. One way to automatically detect the camera position is to use COLMAP, which is based on SfM technology and is used, for example, to estimate the camera position from drone video. COLMAP is particularly good at estimating the position of a camera in video in which the target object is stationary and orbiting around it.

NERF can output images with finer detail than conventional methods.

Source: https://arxiv.org/pdf/2003.08934.pdf

Usage

With the ailia SDK, it is possible to input the converted 3D model as ONNX, actually infer it, and generate an image.

Render one frame with the following command.

python3 nerf.py -s output.png

You can change the viewpoint to render with the angle option.

python3 nerf.py --angle 100

You can increase the rendering resolution with the render_factor option. Renders at 1/4 resolution by default. The settings below will render at 1/2 resolution.

python3 nerf.py --render_factor 2

Related information

3D reconstruction using COLMAP and NeRF

Convert NeRF to volume texture and render in Unity

Back to your idea of caching the 4d volume. I have a unity project and video that realize this. It is very fast (>100FPS) in the video, but the image quality is definitely lower than if you do the “correct” inference. I didn’t write python code to evaluate it, but in my opinion it might result in 5~6 points difference in PSNR.

ax Inc. has developed ailia SDK, which is a self-contained cross-platform high speed inference SDK for rapid AI application development.

ax Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.

--

--