Pointnet++
PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space: Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas
Pointnet learns a spatial encoding of each point and then aggregate all individual point features to a global point cloud signature. So, Pointnet doesn’t capture local structure induced by the metrics.
Pointnet++ is build upon Pointnet, but which fails to capture local structure and generalize to complex scenes.
The intuition of Pointnet++ came from the basic CNN structure where its lower level neurons have smaller receptive fields whereas lager level has larger receptive fields. The ability to abstract local patterns along the hierarchy allows better generality to unseen cases.
Pointnet++ is a hierarchical network that applies Pointnet recursively on a nested portioning of the input point cloud. It proposes novel set learning layers to adaptively combine features from multiple scales from varying densities. Similar to CNNs, Pointnet++ extracts local features from a small neighborhood and further grouping into larger units and processed to produce higher level features. This process is recursive until we obtain the feature of the whole point set.
There are two issues addressed by Pointnet++:
- How to generate partitioning of point set
- How to an abstract set of points or local features through a local feature learner.
These both issues are correlated as the partitioning of the point set has to produce common structure partitions, so that the weights of the local features can be shared. Pointnet++ uses Pointnet as the local feature learner.
The hierarchical structure is composed of a number of set abstraction levels. The set abstraction layers consist of three layers: Sampling layer, Grouping layer and Pointnet layer.

Sampling layer:
The sampling layer selects a set of points from input points which defines the centroid of the local regions. Given the point in pointcloud {x1, x2, …, xn}, it uses farthest point sampling (FPS) to chose a subset of points {s1, s2, …, sn} , such that sk is the most distant point from the {s1, s2, …, sn}. FPS has better coverage than random sampling given the same number of centroids.
Grouping layer:
The grouping layer constructs local region set by finding the neighboring points around the centroid. The input to this layer is point cloud of size Nx(d+C) and the coordinates of a set of centroids from the sampling layer Nxd. The output is a group of point cloud of size N`x K x (d + C) where each group will correspond to a local region and K is the number of points in the neighborhood of the centroid points.
There are two methods to find the neighboring points:
- Ball query: finds all the points within a radius to the query point, but an upper limit of points (K) is set.
- K nearest neighbor(kNN): finds a fixed number of neighboring point with respect to the distance metrics.
If these methods are compared, ball query guarantees a fixed region scale making the local region feature more generalized across space.
Pointnet layer:
The Pointnet layer uses a mini-pointnet to encode the local region patterns into feature vectors. The input is N` local regions with size N`x K x (d +C) from the grouping layer, which is abstracted by its centroid and local feature for the output. The output size is N`x(d + C’).
The points in local region are firstly translated into local frame relative to the centroid points xij = Xi-xij, where Xi is the coordinate of the centroid.
Analysis
Pointnet++ is a powerful network architecture for processing set sampled points in a metric space. Pointnet++ functions recursively on a nested partitioning of the point set and is effective in learning hierarchical features with respect to the distance metrics.
Reference
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space (Paper)
