Capture Images for Gaussian Splatting

Exploring 3D Gaussian Splatting: A Journey from Concept to Real-world Online Exhibition — Part One

Yulei He

11 min readMar 10, 2024

Screen record of this project

FIGLIE

Una mostra personale di Youjing Wu. Venerdì 27 ottobre si è inaugurata la mostra "Figlie" di Youjing Wu presso la…

current-exhibition.com

Handcrafted 3D vs Radiance Field

In 2022, the COVID outbreak led to a lockdown in Shanghai. The Levant Art Gallery in Shanghai reached out to a project manager to explore the possibility of creating an online VR exhibition as an alternative to their traditional offline show.

As a full-stack developer, I spent almost seven months to tasks such as 3D modeling, texture building, and web application programming for this project.

Screen record of handcrafted 3D Exhibition

荒岛是什么颜色的？

遇到荒岛的人,遇到了不寻常的风景。反复回到原点,反复问自己,荒岛是什么颜色的？

current-exhibition.com

Handcrafted 3D exhibitions showcase exceptional creativity, but they require a significant time investment and depend heavily on the creator’s vision, making them less conducive to standardization and efficiency.

During that same period, I found myself captivated by Twitter posts from Luma AI. The transform from photo capture to video was mesmerizing and truly astounding. The technique behind this approach is called Radiance Fields.

I began studying into the original NeRF paper from UC Berkeley and exploring other research conducted worldwide over the following two years. In fact, since 2020 Radiance Fields have garnered significant attention for their potential applications in computer graphics and content creation.

Most notably, NerfStudio, which is open-source and combines many radiance fields methods, aids in research and commercial use.

This exploration sparked my inspiration to using Radiance Fields for presenting 3D scenes online, enabling the transition of physical exhibitions to the digital twin while maintaining photorealism. However, at that time, the outputs of radiance fields were not functional online and could only be processed on high-end computers with powerful GPUs.

In 2023, a paper and accompanying source code on 3D Gaussian Splatting were published. This work showcased both high-performance rendering and excellent image quality. At that point, I believed it was the opportune moment to integrate it into real-world projects.

Opportunity

Before capturing the gallery scene, I had conducted numerous radiance field captures and training sessions, collaborating with Luma AI and NerfStudio. Initially, Gaussian Splatting wasn’t available with them. So using the official Gaussian Splatting source code instead.

Subjects ranged from bicycles, cars to courtyards. Leveraging my photogrammetry experience, I aimed to optimize results with radiance fields.

In late 2023, I showcased radiance field results to Laboratorio 31 Art Gallery in Bergamo, proposing to capture and publish an online exhibition. Fortunately, they granted me the opportunity.

Challenge

Finding introductions or tutorials for capturing images for NeRF or Gaussian Splatting proves challenging. However, the photogrammetry capture method fulfills the requirements.

Environmental factors such as changes in lighting and scene composition can introduce noise or errors, affecting the quality of results.

To mitigate these challenges, a Interchangeable Lens Camera, standardized settings, and a streamlined capture process were utilized. Challenges persist, including limited space within the interior and the need to balance the quantity of captures for exploration without compromising quality.

A targeted approach, focusing on eye-level captures in three rows, helps achieve comprehensive coverage while minimizing noise and simplifying data processing.

Challenge of Captures:

Limited space: The confined interior space presents challenges regarding adequate distance during capture. The small size restricts the ability to position the camera effectively, resulting in insufficient distance between the camera and the walls.
Texture-less surfaces: Such as the white walls of the gallery, present challenges for software like Colmap, which relies on texture for accurate alignment. Without distinct textures to reference, the software may struggle to achieve precise alignment, leading to potential inaccuracies in the reconstruction process.
Environmental changes: Fluctuations in lighting conditions, such as changes in sunlight and environmental factors like people moving within the scene, pose challenges for consistent image capture. Additionally, time constraints may restrict the opportunity to capture the scene under ideal conditions.
High-contrast environment: The gallery scene features high-brightness lighting focused on the works, while areas such as the ceiling, hallway, and street view remain dark, creating a stark contrast with the gallery lights. Capturing this type of scene may result in overexposed or underexposed images.
Minimizing noise: During the capture process is essential for achieving high-quality results. This requires careful management of factors such as exposure settings, camera positioning, and the number of images captured. Both an excessive and insufficient number of images can lead to artifacts or distortions in the final reconstruction. Thus, finding the right balance is crucial for optimal outcomes.

Camera and Lens

For this project, the Sony 6100 with a 16–50mm kit lens was utilized, resulting in a total of 527 still images captured.

Before using this camera, tried to take pictures or record videos with a smartphone. Use ffmpeg to convert video recordings into images. However, this approach did not yield satisfactory results.

Smartphones tend to automatically enhance each capture, aiming for the best result. However, these optimizations, such as sharpening edges, adjusting exposure, and optimizing colors, can introduce noise or errors that are not suitable for Gaussian Splatting training.

I believe that most Interchangeable Lens Camera can meet the capture needs. Whether it’s a DSLR or a mirrorless camera, the only requirements are a wide-angle lens and fully control of camera settings, like manual exposure, shutter speed, white balance etc.

Camera Settings

To tackle the challenge of high-contrast environments, tried using settings like “DRO” (Dynamic Range Optimizer), “HDR” (High Dynamic Range), and “Bracketing” to improve exposure range. However, the outcomes were disappointing. Despite attempts to capture HDR images, both the camera and post-processing methods introduced noise, resulting in less than optimal results.

The ideal capture is purely like screenshot of computer graphic world. No distortion, no Vignetting, no Chromatic Aberrations etc. This is why Blender Dataset make best result always.

But the real-world scene can not archive that perfectly. The processing of capture must reducing all real-world effects.

The process of capture. With manual exposure, measuring lightest and darkest part of the scene. After find the final setting of capture, fix this settings for all images capture. Why don’t use auto exposure? Because difference angle capture of the scene, if using auto exposure, will introduce difference of light and color each other.

Settings were fixed for the scene:

Aperture: f/11
Exposure time: 1/13 sec
ISO: 100
Focal length: 16mm
Focus Mode: Manual Focus
White Balance: C.Temp./Filter
DOR / Auto HDR: Off
Lens Compensates: Shading Comp. Off, Chromatic Aberration Comp. Off, Distortion Comp. Auto (16–50mm kit lens don’t support Off)
File Format: RAW

Avoid:

Auto Aperture: May lead to variations in focus distance.
Auto Exposure: While it may appear adequate, the exposure time steps can result in exposure variations.
Auto ISO: Similar to auto exposure, auto ISO steps can lead to exposure discrepancies.
Auto Focal Length: Can result in variations in focus distance.
Auto White Balance: May cause variations in color.
DRO / Auto HDR: May lead to differences in exposure variations.
Lens Compensation: Auto settings may differ between captures.
JPEG Format: Lens Compensations cannot be adjusted later.

Tips:

Higher aperture number: This widens the focus distance, but avoid setting it too large as it may lead to a drop in image quality. Each camera and lens combination should find the sweet spot for the aperture number.
Lower ISO: Higher ISO settings can introduce noise into the images.
Wider focal length: This helps reduce the number of image captures required while increasing overlap.

Capture

To determine the capture position, a predetermined plan was followed, taking into account factors such as the composition of the scene, desired angles, and lighting conditions. This plan was developed based on an understanding of the scene’s layout and the desired perspective for the captures. The guideline is to maintain wide viewing angles. This approach helps in reducing the number of pictures required while increasing overlap.

The focus was on capturing images at eye level, dividing the scene into three rows. The first row involved capturing images straight ahead at eye level. For the second row, the camera was tilted upwards towards the ceiling, while for the third row, it was tilted downwards towards the floor. Each row was captured at slightly varied heights, aiding in camera alignment during processing.

This targeted approach ensured sufficient coverage of the scene while minimizing the number of captures required. By organizing the capture process in this manner, efficient data processing was facilitated, reducing the potential for noise and complexity in the final result.

With the camera secured on a tripod and connected to a remote control, the process simply involved positioning the camera according to the predetermined plan and then pressing the button to capture the image.

All 527 captures were completed within a timeframe of 2 hours, coinciding with the gallery’s lunch break when it was closed.

Recommendations for enhancing capture quality:

Sharpness and Focus: Ensure that your images are well-focused and sharp. Blurry or out-of-focus images can hinder feature detection algorithms’ ability to find distinct keypoints accurately.
Sufficient Lighting: Adequate lighting helps in capturing clear and detailed images, which is crucial for feature detection. Avoid overly bright or harsh lighting conditions as they can cause glare or wash out details.
Texture and Contrast: Look for scenes with rich texture and high contrast. Features like edges, corners, and distinct patterns are easier for feature detection algorithms to identify.
Overlap: When capturing images for later reconstruction, ensure there is sufficient overlap between consecutive images. This helps in establishing correspondences between features in different views, leading to better 3D reconstruction.
Variety of Views: Capture images from different viewpoints, angles, and distances to cover the scene comprehensively. This diversity aids in detecting features from various perspectives, improving the robustness of the reconstruction.
Consistent Illumination: Try to maintain consistent lighting conditions across all images in the dataset. Sudden changes in lighting can cause inconsistencies in feature detection and matching.
Stability: Minimize camera shake and motion blur by using a tripod or stabilizing the camera. This ensures that the features remain consistent between images, facilitating accurate matching.
Avoid Overexposure and Underexposure: Properly expose your images to retain detail in both bright and dark regions. Overexposed or underexposed areas may lack the necessary information for feature detection.
Clean Lens and Sensor: Keep your camera lens and sensor clean to prevent dust or smudges from affecting image quality. Dirty optics can introduce artifacts that interfere with feature detection.
Calibration: If possible, calibrate your camera to correct for lens distortion and other optical aberrations. A calibrated camera helps in accurately estimating camera poses and improves the quality of feature matching.

Parallax

Parallax refers to the apparent shift in the position of an object when viewed from different angles. In photogrammetry, parallax is crucial because it allows for the triangulation of points from multiple images, which is essential for accurate 3D reconstruction of objects or scenes. By capturing images from different perspectives, photogrammetry software can analyze the parallax between corresponding points in the images to determine their 3D coordinates, thus enabling the creation of detailed 3D models.

To ensure that your capture planning meets the needs of parallax for photogrammetry, consider the following:

Coverage from Multiple Angles: Plan to capture the subject from multiple angles to maximize parallax. This means moving around the subject and capturing images from various viewpoints to provide ample information for accurate triangulation.
Overlap Optimization: Strategically plan the overlap between consecutive images to maximize the amount of parallax information available for reconstruction. Aim for a 60–80% overlap between images to ensure sufficient matching points.
Diverse Camera Positions: Experiment with different camera positions, heights, and distances from the subject to capture a variety of perspectives. This diversity helps in capturing parallax from various angles, improving the accuracy of the reconstruction.
Include Depth Variations: Incorporate depth variations in the scene by capturing images from different distances from the subject. This helps in capturing parallax not only horizontally but also vertically, leading to a more detailed 3D reconstruction.
Plan for Complex Surfaces: If the subject has complex surfaces or intricate details, plan to capture images from multiple angles to ensure thorough coverage and capture as much parallax information as possible.
Avoid Symmetrical Shooting: Try to avoid shooting the subject from purely symmetrical angles, as this can limit the amount of parallax available for reconstruction. Instead, aim for diverse and asymmetrical viewpoints.
Consistent Lighting: Ensure consistent lighting conditions throughout the capture process to minimize variations in brightness and shadows, which can affect parallax and the accuracy of reconstruction.

By incorporating these considerations into capture planning process, can optimize photogrammetry workflow to effectively leverage parallax for accurate 3D reconstruction.

Capture Patterns

Grid Pattern

Fern, NeRF LLFF data (mildenhall2019llff)

One common example of a capture pattern is the grid pattern. In this pattern, photographs are captured systematically in a grid-like arrangement, with each photograph overlapping adjacent images to ensure comprehensive coverage of the target area. The grid pattern is particularly useful for documenting large, flat surfaces such as building facades or archaeological sites. The grid pattern offers several advantages, including ease of implementation, uniform coverage, and scalability to different project sizes.

Circular Pattern

Lego, NeRF synthetic data (mildenhall2020nerf)

When capturing images of a object, a car for example, common approach is to use a circular pattern. Which involves capturing photographs of the car from multiple viewpoints arranged in a circular or semi-circular configuration around the vehicle.

Capture Pattern for Interior

This project

When capturing images of interior spaces, various challenges arise. As mentioned previously, limited space poses one of the most significant challenges for interior capture. The common grid pattern, where photographs are taken at regular intervals along the length and width of the room, often presents drawbacks. This is because the camera tends to get too close to the walls, which are often texture-less, resulting in poor camera alignment.

To address this issue, a developed pattern for interior capture can be utilized, combining elements of both the grid and circular patterns. This approach aims to maintain the camera position and viewport as far from the target as possible, ensuring a wider field of view for better coverage and alignment.

Postprocess

All images were captured in RAW format and subsequently processed using RawTherapee, an open source raw photo processing software. In the post-processing stage, adjustments such as white balance and exposure were made using Batch Edit functionality to ensure consistency across the images. Utilize RawTherapee’s Batch Edit function to ensure consistent adjustments across all captures, thereby streamlining the editing process and maintaining uniformity in image quality.

The RAW images for this scene have a size of 6024 x 4024 pixels. During Gaussian Splatting training, inputs are automatically rescaled to 1.6k. The output image size was set to 2000 x 1335 pixels, which is close to 1.6k and larger. For the next step in training, the resolution parameter will be set to 1 to utilize the original image resolution.

Conclusion

In conclusion, the journey from exploring 3D Gaussian Splatting to implementing it in a VR exhibition has been both challenging and rewarding. The project evolved from traditional handcrafted 3D exhibitions to leveraging cutting-edge techniques like Radiance Fields and Gaussian Splatting.

Through meticulous capture sessions, successfully integrated Gaussian Splatting into real-world projects, culminating in the creation of an online exhibition for Laboratorio 31 Art Gallery.

Encountered numerous challenges, from limited space and texture-less surfaces to fluctuating lighting conditions and the need for consistency in image capture. However, by adhering to standardized capture settings, employing targeted capture strategies, and post-processing with precision, we overcame these obstacles and achieved high-quality results.

Looking ahead, there are still opportunities for further refinement and enhancement in capture quality, post-processing techniques, and integration of advanced technologies. By continually pushing the boundaries of innovation and collaboration, we can unlock new possibilities in the realm of virtual exhibitions and immersive experiences.

Next: Camera Aligning and Gaussian Splatting Training

Once the scene has been captured and all the photos have been transferred from RAW to JPEG, we are ready to move on to the next step: Camera Alignment and Gaussian Splatting Training (No published).

Support Me

If you enjoyed this article, consider supporting me by buying me a cup of coffee. Alternatively, if you’re interested in the captured images, all 527 capture images are available for download here 👇