Euclidean Clustering for Lidar point cloud data

5 min readJun 28, 2020

In this article you will get to know how to cluster the point cloud data to locate and cluster objects which can be later classified into obstacles, traffic signs, vehicles, pedestrians, etc.

Before clustering the point cloud data, the data is first filtered and segmented generally using voxel filtering, and ransac based segmentation. To know more about these you can check my previous article here.

Data Structure for Storing 3D PCD points — KDTree

The first thing which comes to the mind of a developer before building a clustering algorithm would be to first create a optimized data structure for storing the points of the point cloud data(pcd) for easy search and retrieval.

Although python offers lot of data structures like dictionary, tuples, lists, collections, etc. KDTree was preferred because of its speed and performance characteristics for handling large amounts of data, and inherent data partitioning nature which is very useful while calculating the nearest neighbor points.

So you can choose to use sklearn implementation of KDtree, but for the purpose of this application, I chose to implement my own Kdtree structure because I could build highly customized version of it, plus it was a great learning experience as well.

The most rudimentary definition I could think of a KDtree is that, its a structure of nodes which are interlinked sequentially together based on their properties, the arrangements of nodes is similar to that of a Tree structure in K dimensions, which makes it efficient to search for neighboring points. A more technical definition of a KDtree can be found here. The below example shows the definition of a KDtree node, the node contains point coordinates and point id. The left_node and right_node point to the other interlinked nodes.

So now we know what are nodes, we know the underlying building blocks. Let’s go ahead and build the KDTree.

Line 10:16 — insert_points method

The constructor variable root refers to the first node. Inputs include a pcd_dataframe — x,y,z points arranged as columns), and display_output_flag which enables the display of all the points with the depth information.(store in kdtree_display_dict). We are iterating through each row containing a point which is given as an input to the build_kdtree method. And if the display_output_flag is enabled, it will call the method which will store the nodes and in kdtree_display_dict, and display the data inside the Kdtree(preorder traversal-root, left right).

Lines 28:50 — build_kdtree method

This method is quite straightforward:
Step 1. Check if the current node being passed is empty.
Step 2. If the current node is empty then create and return a node.
Step 3. If the current node is not empty, then check if the current point being passed is less than the point data stored in the current node.
Step 4. Call the build_kdtree method, but adding the current point as a right node with an incremented depth level.(X →Y →Z).
Step 5. Step 4 should create a new node added to the right side of the current node.
Step 6. If the conditions are not satisfied at Step 3, then repeat Step 4 but with keeping the left node instead of the right one.
Step 7. Step 5 should create a new node added to the left side of the current node.

When this method is run iteratively will automatically check the points data to recursively build a KDtree structure.

Lines 52:86—search_elements method

This method finds the nodes which are near to each other based on their Euclidean distance(straight-line distance between two points). We ensure that that the depth always stays in the range (0,1,2) by adding the modulo operator in Line 80. To find the nodes near our search_point we use a distance_threshold input parameter. In lines 84–86. We are basically checking if the current node that we are at is within the point is within the limits by checking the following conditions.

Checking if current node point (x,y,z) falling within the limts

If the conditions are satisfied then the Euclidean distance between the search_point and the current_node point is calculated.

Distance between search_point and current_node point

If the Euclidean distance is within the distance_threshold limit we add this point as a near point in kdtree_search_results. Then we recursively iterate through the left and the right node of the current node to find all the related points which is implemented by the search conditions in lines 97–104. For using the method we always provide the root node as an input while using in our application, this ensures that we go through all the nodes and cover all the points which are within the search limits in a much faster mechanism instead of covering all points in a conventional data structure.

Lines 88–121 — kdtree_display_dict(optional)

Stores the depthwise KDtree elements. Useful while debugging.

Euclidean Cluster Application Code

Lines 66–96 — find_clusters method

Before running the eponymous method, a KDtree is build in the __init__ method. Initially every point is assigned a processed_flag_ with a default value of False, and when the point is processed its changed to True. If the point is not processed before, it’s assigned as a new cluster and all the nearby points to the cluster is found by the search_elements method. All the points returned are then recursively iterated again through the find_clusters method, to check if they are part of another cluster and to verify if the nearby points have not been processed before.

Lines 31–54 — euclidean_clustering method

In this method each we iterate through all the points in the pcd file, and call the find_clusters method to create new clusters and then check if the cluster satisfies the min_size parameter. (max_size is not considered as it doesn’t make sense while clustering for segregating large objects)

Lines 98- 129 — visualize_clusters method

Matplotlib and seaborn libraries work well for 2D plots, but for 3D visualization, according to my personal preference, I would choose plotly over any other library. The plots have a very simplistic UI and intuitive interactivity features. In the final plot we draw the all the points in the pcd file and then overlay all the cluster points on top of them.

Final Results

3D Visualization of clusters identified(marked with different colors)

This is how the final results looks like, we have a 3D point cloud scan, with limited number of scan-points because we haven’t segmented the ground plane, and we have also limited the number of points as python causes a max recursive depth exception which can be handled in the code, but which is not required for explaining the core concepts of Euclidean clustering. Feel free to tinker around with the code, and modify the parameters to get different results. For a real time implementation I would strongly recommend the C++, which you can find here.

If you liked my explanation and found it useful to understand Euclidean clustering and its core concepts please don’t be shy to star my repository in GitHub and generously add claps.

References

www.udacity.com — Sensor Fusion Course

www.plotly.com — 3D Scatter plot visualizations

https://github.com/ajith3530/Python_EUCLIDEAN_clustering

Euclidean Clustering for Lidar point cloud data

Data Structure for Storing 3D PCD points — KDTree

Euclidean Cluster Application Code

Written by AJith RaJ