Semantic Segmentation with Open3D-ML, PyTorch Backend, and a Custom Dataset

Carlos Argueta
6 min readSep 23, 2022

--

Note: Instructions to download, run, and troubleshoot the code introduced in this article are provided at the end.

As part of my experimentation with Open3D-ML for Point Clouds, I wrote articles explaining how to install this library with Tensorflow and PyTorch support. To test the installation, I explained how to run a simple Python script to visualize a labeled dataset for Semantic Segmentation called SemanticKITTI. In this article, I go over the steps I followed to do inference on any Point Cloud, including the test portion of SemanticKITTI, as well as on my private dataset.

The rest of this article assumes that you have successfully installed and tested Open3D-ML with PyTorch backend by following my previous article. Having done so also means you have downloaded the SemanticKITTI dataset. To run a Semantic Segmentation model on unlabeled data, you need to load an Open3D-ML pipeline. The pipeline will consist of a Semantic Segmentation model, a dataset, and probably other pre/post-processing steps. Open3D-ML comes with modules and configuration files to easily load and run popular pipelines.

To do inference on new Point Clouds, we will use a popular model called RandLA-Net presented in a 2019 paper titled RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. Conveniently, Open3D-ML has an implementation of this method and has configurations to load and run such a method on the SemanticKITTI dataset without much effort.

To load the configuration file we need the following code, making sure to replace /path/to/Open3D/ with the path where you cloned the Open3D repository when installing.

# Load an ML configuration file
cfg_file = "/path/to/Open3D/build/Open3D-ML/ml3d/configs/randlanet_semantickitti.yml"
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

Next, we will create a RandLANet model using the configuration object, and we will add the paths to the SemanticKITTI dataset as well as to our custom dataset. Make sure to replace /path/to/save/dataset/SemanticKitti/ with the path where you saved the SemanticKITTI data when installing Open3D-ML. For now, the custom dataset is pointing to some of my personal Point Clouds collected with my robot and provided in the repo accompanying this article.

# Load the RandLANet model
model = ml3d.models.RandLANet(**cfg.model)
# Add path to the SemanticKitti dataset and your own custom dataset
cfg.dataset['dataset_path'] = '/path/to/save/dataset/SemanticKitti/'
cfg.dataset['custom_dataset_path'] = './pcds'

The next step is to load the datasets. To load the SementicKITTI dataset, Open3D-ML has convenient helper classes and methods.

# Load the datasets
dataset = ml3d.datasets.SemanticKITTI(cfg.dataset.pop('dataset_path', None), **cfg.dataset)
custom_dataset = load_custom_dataset(cfg.dataset.pop('custom_dataset_path', None))

A simple custom function is added to load the custom dataset. Notice that this dataset has to be in the PCD format.

def load_custom_dataset(dataset_path):
print("Loading custom dataset")
pcd_paths = glob.glob(dataset_path+"/*.pcd")
pcds = []
for pcd_path in pcd_paths:
pcds.append(o3d.io.read_point_cloud(pcd_path))
return pcds

Next, a pipeline is created using the configuration, model, and dataset objects. If not available, the model parameters (checkpoint) are downloaded before being loaded into the pipeline.

# Create the ML pipeline
pipeline = ml3d.pipelines.SemanticSegmentation(model, dataset=dataset, device="gpu", **cfg.pipeline)
# Download the weights.
ckpt_folder = "./logs/"
os.makedirs(ckpt_folder, exist_ok=True)
ckpt_path = ckpt_folder + "randlanet_semantickitti_202201071330utc.pth"
randlanet_url = "https://storage.googleapis.com/open3d-releases/model-zoo/randlanet_semantickitti_202201071330utc.pth"
if not os.path.exists(ckpt_path):
cmd = "wget {} -O {}".format(randlanet_url, ckpt_path)
os.system(cmd)
# Load the parameters of the model.
pipeline.load_ckpt(ckpt_path=ckpt_path)

To run the model on an unlabeled Point Cloud from the SemanticKITTI test set, we first pick a given data point by its index, then we run the inference action from the pipeline. You can change the value of the variable pc_idx to select another Point Cloud.

# Get one test point cloud from the SemanticKitti dataset
pc_idx = 58 # change the index to get a different point cloud
test_split = dataset.get_split("test")
data = test_split.get_data(pc_idx)
# run inference on a single example.
# returns dict with 'predict_labels' and 'predict_scores'.
result = pipeline.run_inference(data)

A Point Cloud data instance in the SemanticKITTI dataset is loaded as a Python dictionary containing the keys “point”, “feat”, and “label”. The last two have None and a Numpy array filled with 0s as values respectively and are not used during inference. The “point” key is associated with a Numpy array containing the x, y, and z coordinates of the LiDAR points. To visualize the result of the inference using the Open3D visualizer, we need to create a Point Cloud object from the “point” part of the dictionary, and then colorize the points with the labels returned by the inference.

# Create a pcd to be visualized 
pcd = o3d.geometry.PointCloud()
xyz = data["point"] # Get the points
pcd.points = o3d.utility.Vector3dVector(xyz)
# Get the color associated with each predicted label
colors = [COLOR_MAP[clr] for clr in list(result['predict_labels'])]
pcd.colors = o3d.utility.Vector3dVector(colors) # Add color data to the point cloud
# Create visualization
custom_draw_geometry(pcd)

The SemanticKITTI dataset has 19 classes plus a background class. A color mapping from class label to point color has to be provided. For readability, the RGB colors are defined as integers, but the visualizer uses doubles from 0.0 to 1.0 so some code to do the conversion is provided.

# Class colors, RGB values as ints for easy reading
COLOR_MAP = {
0: (0, 0, 0),
1: (245, 150, 100),
2: (245, 230, 100),
3: (150, 60, 30),
4: (180, 30, 80),
5: (255, 0., 0),
6: (30, 30, 255),
7: (200, 40, 255),
8: (90, 30, 150),
9: (255, 0, 255),
10: (255, 150, 255),
11: (75, 0, 75),
12: (75, 0., 175),
13: (0, 200, 255),
14: (50, 120, 255),
15: (0, 175, 0),
16: (0, 60, 135),
17: (80, 240, 150),
18: (150, 240, 255),
19: (0, 0, 255),
}
# Convert class colors to doubles from 0 to 1, as expected by the visualizer
for label in COLOR_MAP:
COLOR_MAP[label] = tuple(val/255 for val in COLOR_MAP[label])

The custom function that draws the Point Cloud with the result of the semantic segmentation is as follows.

def custom_draw_geometry(pcd):
vis = o3d.visualization.Visualizer()
vis.create_window()
vis.get_render_option().point_size = 2.0
vis.get_render_option().background_color = np.asarray([1.0, 1.0, 1.0])
vis.add_geometry(pcd)
vis.run()
vis.destroy_window()

To run the inference on our private data, we follow a similar process. An index for the desired data point is provided, and a custom function to load and pre-process a PCD file is executed, before passing the resulting dictionary to the pipeline. Then, we colorize and display the segmented Point Cloud.

# Get one test point cloud from the custom dataset
pc_idx = 2 # change the index to get a different point cloud
data, pcd = prepare_point_cloud_for_inference(custom_dataset[pc_idx])
# Run inference
result = pipeline.run_inference(data)
# Colorize the point cloud with predicted labels
colors = [COLOR_MAP[clr] for clr in list(result['predict_labels'])]
pcd.colors = o3d.utility.Vector3dVector(colors)
# Create visualization
custom_draw_geometry(pcd)

The custom function to prepare the data receives a PCD obtained from the list of PCDs, removes non-finite points (nan and +/-inf values), obtains the points data from the PCD, and constructs a dictionary appropriate for the pipeline with it. It then returns the PCD and the dictionary.

def prepare_point_cloud_for_inference(pcd):
# Remove NaNs and infinity values
pcd.remove_non_finite_points()
# Extract the xyz points
xyz = np.asarray(pcd.points)
# Set the points to the correct format for inference
data = {"point":xyz, 'feat': None, 'label':np.zeros((len(xyz),), dtype=np.int32)}
return data, pcd

To run the code and see the results yourself, activate your Conda environment and follow these steps.

Step 1: Clone the repo

git clone https://github.com/carlos-argueta/open3d_experiments.git

Step 2: Run the code

cd open3d_experiments
python3 semantic_torch.py

Step 3: Troubleshooting

If you get the error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_gather)

Open the file /path/to/your/conda-env/lib/python3.9/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py, making sure to replace /path/to/your/conda-env with the path of your Conda environment and python3.9 with your version of Python.

semseg_loss.py code before the fix

Next, find line 9 and add .to(device) at the end of it.

semseg_loss.py code after the fix

Close and save the file, and that should fix the problem.

If you get the error: ModuleNotFoundError: No module named ‘tensorboard’ then run:

pip install tensorboard

Step 4: Enjoy!

Semantic Segmentation on SemanticKITTI and private data.
Selecting other Point Clouds from SemanticKITTI and private dataset.

--

--

Carlos Argueta

Working on Autonomy for Mobile Robots with an emphasis on State Estimation and the Perception Stack. I occasionally also work on Natural Language Processing.