Real-time Background Segmentation (without machine learning) using Intel® RealSense™ D405 Depth Camera and Python3

JITHIN MATHEW
5 min readJul 21, 2023

--

Ever wondered how to achieve real-time background segmentation without the need for complex machine learning algorithms? Step into the world of real-time background segmentation with Intel® RealSense™ D405 Depth Camera and Python3, and experience the magic of dynamic background separation!

Indeed, it has been some time since I last published an article. In a previous article, I discussed the utilization of the Intel® RealSense™ D435i camera for detecting and measuring the distance of objects from the camera. In this new article, our focus shifts towards exploring the capabilities of the depth camera Intel® RealSense™ D405 for background segmentation in images. This topic promises to be straightforward, enjoyable, and simple to understand.

It’s worth mentioning that while the article centers on the D405 depth camera, the code has undergone testing with D435i as well and is likely compatible with other depth cameras in the D400 series.

The link to my GitHub to access the Python 3 code used in this project is provided throughout this article.

Photo by Nik on Unsplash

Prerequisites: Before we dive into the implementation, make sure you have the following prerequisites:

  • Intel® RealSense™ SDK installed.
  • Depth camera (D405 or D435i).
  • Basic knowledge of Python.

Requirements (Python 3 packages):

  1. pyrealsense2
  2. OpenCV — python (for displaying and processing output)

A little about Depth Camera D405

Intel® RealSense™ D405, a RealSense is a depth camera in the 400 series. The camera operates with 1280 x 720 stereo depth and RGB resolution. The camera has a diagonal Field of View (FOV) of 87° with Dual global shutter sensors for up to 90 FPS for RGB and depth streaming. It is important to note that the camera operates within the range of 7cm to 50cm (0.07m- 0.5m). The camera offers a high-quality depth data with ±2% (at 50 cm) deviation from the ground truth.

Image Segmentation

Image segmentation is an essential step in many computer vision tasks, such as object detection, recognition, and tracking, image compression, and medical image analysis. It allows us to extract meaningful information from an image and reduce the complexity of the image representation, making it easier to analyze and process.

If you frequently delve into image-processing or computer vision, you’ve likely encountered a variety of techniques for segmenting image foreground from background. From classic methods like Thresholding, Edge detection (e.g., Canny edge detection), Clustering (grouping pixels into clusters), to Region-based approaches, image segmentation has evolved significantly over the years. While the trend from image segmentation started in the early years of image processing with the above techniques, today most of the background segmentation has shifted to Deep Learning based background segmentation.

Advantages of Depth Camera based segmentation:

  1. It operates in real-time.
  2. Very low computational requirements (could be run on a CPU effortlessly).
  3. No Machine Learning Training Required: Unlike many other segmentation techniques, depth camera-based segmentation does not rely on pre-trained machine learning models, eliminating the laborious and resource-intensive process of training such models.

Disadvantages:

  1. Need an external depth camera (which may add to the equipment and setup costs).
  2. The foreground object or object of interest need to be present within the camera’s depth range, for e.g., between 7–50 cm for D405 making it unsuitable for background removal in distant scenes outside this range.

Let’s dive deeper into the code

Python 3 libraries used:

import pyrealsense2 as rs
import os
import cv2
import numpy as np

First we start by setting up the camera by initializing a realsense pipeline. For this project we will go with the maximum image size of (1280x720) for both color and depth streams. Finally the streaming is enabled with ‘pipeline.start’ by passing on the predefined parameters.

pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)
######################################################################
config.enable_stream(rs.stream.depth, 1280, 720, rs.format.z16, 30)
#####################################################################
print(“[INFO] Starting streaming…”)
pipeline.start(config)
print(“[INFO] Camera ready.”)

Here we use OpenCV to display the stream from the camera with a predefined window size using the following code:

cv2.namedWindow('D405', cv2.WINDOW_NORMAL)
cv2.resizeWindow('D405', 600, 800)

To make this project more fun, let add a track bar to change the distance to which the background needs to be segmented (within the range of 7 to 50 cm).

range_value = 40

def update(val):
global range_value
range_value = val

cv2.createTrackbar('Depth', 'D405', 7, 50, update)

The next step would be to initialize an infinite loop that will continuously collect the frames for the Realsense pipeline. Once the frames are collected, we then extract the color frames (RGB) and depth frames into separate variables. Next, the color image is converted into a NumPy array, and the intrinsic parameters of the color stream are obtained from the color frame profile. The depth frame is colorized to create a depth color frame using the RealSense colorizer. Finally, the color image is displayed in a window named ‘D405’ using OpenCV (cv2.imshow()). The script continuously waits for user input by checking for the ‘q’ key press. If the ‘q’ key is pressed, the loop is terminated, and the OpenCV windows are closed using cv2.destroyAllWindows(). This allows for a real-time display and interaction with the depth segmentation process.

while True:
frames = pipeline.wait_for_frames()
color_frame = frames.get_color_frame()
depth_frame = frames.get_depth_frame()

color_image = np.asanyarray(color_frame.get_data())
depth_color_frame = rs.colorizer().colorize(depth_frame)
temp_depth = np.asanyarray(depth_frame.get_data())
temp_depth[temp_depth > (range_value * 100)] = 0
color_image[temp_depth == 0] = [255, 255, 255]
color_image = cv2.rotate(color_image, cv2.ROTATE_90_CLOCKWISE)
cv2.imshow('D405', color_image)

if cv2.waitKey(1) & 0xFF == ord('q'):
break

cv2.destroyAllWindows()

A Sample Use Case Application with Background Segmentation and Object Detection

In my previous publication, I explored an innovative approach for high-throughput phenotyping in soybean yield estimation. The study combined background segmentation using the Intel® RealSense™ D405 Depth Camera with object detection techniques to accurately count soybean pods. By employing background segmentation, we successfully removed irrelevant elements from the images, focusing solely on the soybean pods of interest. This preprocessing step proved essential in enhancing the performance of the subsequent object detection model.

This study conducted a comparative analysis of the model’s performance by training the deep learning model with and without background removal from images. Results showed that using a depth camera to remove the background significantly improved YOLOv7’s pod detection performance, increasing precision, recall, mAP@50, and mAP@0.5:0.95 score compared to when the background was present.

Using the depth camera and YOLOv7 algorithm for pod detection and counting achieved high mAP@0.5 and mAP@0.5:0.95 scores. The findings indicated a significant enhancement in the deep learning model’s performance when the background was segmented and a reasonably larger dataset was used to train YOLOv7. Explore the results and further details here.

Conclusion

In conclusion, real-time background segmentation is a critical task for various computer vision applications. With the advent of depth cameras such as the D405 and powerful programming languages such as Python3, it is now possible to perform real-time background segmentation without relying on machine learning algorithms. By utilizing the depth information from the camera, we can segment foreground objects accurately and efficiently. The primary rationale behind adopting this approach is that it provides a straightforward and lightweight substitute that can be integrated into real-time applications without requiring significant computational resources.

Hope you find this article useful. What fun applications would you use it for? Let me know in the comments section.

Sources

https://github.com/jithin8mathew/Depth-segmentation

--

--