Instant NeRF on Google Compute Engine via Chrome Remote Desktop

Rendering a 3D NERF Toy Gun with Neural Radiance Fields (NeRF) on a Google Cloud VM

Takuma Yamaguchi (Kumon)
7 min readAug 14, 2022
NERF Toy Gun Generated with Instant-NeRF

Introduction

Neural radiance field (NeRF) synthesizes novel views of complex scenes using a simple fully connected neural network based on a collection of 2D images.

The paper, Representing Scenes as Neural Radiance Fields for View Synthesis, was presented in ECCV 2020 and won best paper honorable mention. Their project web page is https://www.matthewtancik.com/nerf.

NeRF shows impressive view synthesis, but it’s slow, like 1 to 2 days to train for every single scene and tens of seconds to synthesize one frame on a single NVIDIA V100 GPU. So some studies have been conducted to reduce the computation times.

A SIGGRAPH 2022 paper, Instant Neural Graphics Primitives with a Multiresolution Hash Encoding, has reduced the time for training and frame rendering significantly, like a few seconds for training and a few milliseconds for frame rendering. The dramatic improvement caught a lot of attention. Their project web page is https://nvlabs.github.io/instant-ngp/ and their GUI tool is also available in https://github.com/NVlabs/instant-ngp.

I got interested in using the instant-ngp/instant-nerf as it’s fast, but I didn’t have a development environment with GUI and GPUs on my local machine. So I built such an environment on Google Cloud/GCP.

It can run on Google Colab and an example notebook is available in the repository, but using the GUI tool is fun and allows us to understand the behavior easily.

Build a GUI environment on Google Cloud

Create a VM instance

The first step to create a VM is to select a machine type. Building some packages requires a certain amount of RAM, so 4 CPUs with 26GB memory one is used. Additionally, a GPU is needed, so the cheapest one, NVIDIA T4, is selected. Even with T4, you don’t have to wait for a long time for training scenes.

Machine Type Selection

As for machine image, Debian 10 based Deep Learning VM for TensorFlow Enterprise 2.9 with CUDA 11.3 is used. The simpler image, Debian 10 based Deep Learning VM with CUDA 11.0, should work, but I got some errors while I was trying the instant-ngp.

Machine Image Selection

Using a preemptible VM, the hourly cost of the instance was $0.17 in us-central1.

VM Instance Cost

Setup a GUI environment

When you ssh to the instance, you would see the message. Type y to install NVIDIA drivers automatically.

This VM requires Nvidia drivers to function correctly. Installation takes ~1 minute.
Would you like to install the Nvidia driver? [y/n] y

Install chrome desktop: The next step is to install chrome remote desktop on the instance. Here is the official document, https://cloud.google.com/architecture/chrome-desktop-remote-on-compute-engine

sudo apt update
sudo apt install --assume-yes wget tasksel
wget https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb
sudo apt-get install --assume-yes ./chrome-remote-desktop_current_amd64.deb
sudo DEBIAN_FRONTEND=noninteractive apt install --assume-yes xfce4 desktop-base dbus-x11 xscreensaversudo bash -c 'echo "exec /etc/X11/Xsession /usr/bin/xfce4-session" > /etc/chrome-remote-desktop-session'sudo systemctl disable lightdm.service

Go to the remote desktop site, https://remotedesktop.google.com/headless, from your local machine. Then, move to: Set up another computer > Begin > Next > Authorize.

Copy the command for Debian Linux.

DISPLAY= /opt/google/chrome-remote-desktop/start-host --code="xxxxxxxxxx" --redirect-url="https://remotedesktop.google.com/_/oauthredirect" --name=$(hostname)

Paste the command onto the VM instance and enter your PIN.

On the remote access page, you will see your VM. Click the link and enter your PIN.

Setup Instant-NGP

Install Dependent Packages

sudo apt install -y \
build-essential libatlas-base-dev libboost-filesystem-dev \
libboost-graph-dev libboost-program-options-dev \
libboost-system-dev libboost-test-dev libcgal-dev \
libeigen3-dev libfreeimage-dev libgflags-dev libglew-dev \
libglfw3-dev libgoogle-glog-dev libmetis-dev libomp-dev \
libopenexr-dev libqt5opengl5-dev libsuitesparse-dev \
libxcursor-dev libxi-dev libxinerama-dev qtbase5-dev

Upgrade cmake

sudo apt remove --purge cmake
pip install cmake
hash -r
cmake --version
cmake version 3.24.0

Install Vulkan

Here is the official document, https://vulkan.lunarg.com/doc/sdk/1.3.216.0/linux/getting_started.html.

cd ~
mkdir vulkan
cd vulkan
wget https://sdk.lunarg.com/sdk/download/latest/linux/vulkan-sdk.tar.gz
tar xf vulkan-sdk.tar.gz
source $(ls|grep 1.)/setup-env.sh

Copy files to system directories

sudo cp -r $VULKAN_SDK/include/vulkan/ /usr/local/include/
sudo cp -P $VULKAN_SDK/lib/libvulkan.so* /usr/local/lib/
sudo cp $VULKAN_SDK/lib/libVkLayer_*.so /usr/local/lib/
sudo mkdir -p /usr/local/share/vulkan/explicit_layer.d
sudo cp $VULKAN_SDK/etc/vulkan/explicit_layer.d/VkLayer_*.json /usr/local/share/vulkan/explicit_layer.d
sudo ldconfig # You can ignore some warnings for now

Build Instant-NGP

cd ~
git clone --recursive https://github.com/nvlabs/instant-ngp
cd instant-ngp
cmake . -B build
cmake --build build --config RelWithDebInfo -j

Test Instant-NGP with the Fox Images

On the remote desktop, you can run the instant-ngp for the fox images. You can see high resolution outputs by setting target FPS as 2.0.

cd ~/instant-ngp
./build/testbed --scene data/nerf/fox
Instant-NGP for the Fox Images

It works, but our goal is to render a 3D NERF toy gun or our images. NeRF requires camera poses of input images. As for the fox images, camera poses are included in the transforms.json file in the data/nerf/fox. The next section describes how to predict camera poses.

Setup Instant-NGP for Any Images

COLMAP is a widely used general-purpose Structure-from-Motion (SfM) tool. We can predict camera poses with this tool.

Install Ceres Solver

COLMAP is depending on Ceres Solver.

cd ~
git clone --depth 1 -b 2.1.0 https://github.com/ceres-solver/ceres-solver.git
cd ceres-solver
mkdir build
cd build
cmake .. -DBUILD_TESTING=OFF -DBUILD_EXAMPLES=OFF
make -j
sudo make install

Install COLMAP

cd ~
git clone --depth 1 -b 3.7 https://github.com/colmap/colmap
cd colmap
mkdir build
cd build
cmake ..
make -j3 # updated to -j3 from -j as 26GB RAM is not enough
sudo make install
pip install opencv-python

Test Instant-NGP with the Fox Images From Scratch

First, move or remove the original transforms.json

cd ~/instant-ngp/data/nerf/fox# Move or remove transforms.json
mkdir backup
mv transforms.json backup/
# Output directory
mkdir colmap_text

Launch COLMAP via the remote desktop.

colmap gui
COLMAP GUI

Create a new project through the menu, File > New project

  • Extract feature points of the images withProcessing > Feature extraction > Extract
  • Match feature points with Processing > Feature matching > Run
  • Estimate camera poses withReconstruction > Start reconstruction
  • Save files with File > Export model as text. Select the colmap_text directory which is created while ago
  • Terminate COLMAP
Camera Pose Estimation with COLMAP

Generate transforms.json by running the following script

cd ~/instant-ngp/data/nerf/fox
python ~/instant-ngp/scripts/colmap2nerf.py --colmap_matcher exhaustive --aabb_scale 4

Run instang-ngp the same as before

cd ~/instant-ngp
./build/testbed --scene data/nerf/fox
Instant-NeRF with the Fox Images from Scratch

Instant-NGP for Any Images

Conference Room

Some datasets for NeRF are available from the NeRF project page. https://drive.google.com/drive/folders/128yBriW1IG_3NJ5Rp7APSTZsJqdJdfc1?usp=sharing. Let’s use nerf_llff_data/room/images. The data consists of 41 images.

Conference Room Images

Estimate camera poses with COLMAP. Save colmap outputs in ~instant-ngp/data/room/colmap_text.

Camera Pose Estimation for the Conference Room Images

Generate transforms.json

cd ~/instant-ngp/data/room
python ~/instant-ngp/scripts/colmap2nerf.py --colmap_matcher exhaustive --aabb_scale 2

Run instang-ngp

cd ~/instant-ngp
./build/testbed --scene data/room
Instant-NeRF with the Conference Room Images

As we can see, impressively, the light reflection on the display and ambient occlusion are rendered very well.

NERF Toy Gun

The next target is a NERF toy gun. I borrowed it from my son and took 26 photos using my cellphone.

NERF Toy Gun Images Taken with a Cellphone

COLMAP result

Final result

The output is not perfect, but it’s still amazing as it’s generated based on only 26 images.

Conclusion

NeRF is an impressive technology to generate 3D scenes from a collection of 2D images. Instant-NGP / Instant-NeRF enables very fast model training and rendering novel views. It’s better to have a GUI development environment with a GPU to try it. Setting up a remote desktop environment allows cloud service users to enjoy NeRF easily.

--

--