Mastering Point Clouds: A Deep Dive into the PointNet Architecture

4 min readDec 7, 2023

PointNet is a point-based architecture, designed for the task of processing and analyzing 3D point cloud data, and it specifically addresses challenges related to the representation and analysis of unstructured point cloud inputs.

Introduction to PointNet

Let’s assume, we have a set of 3D points {Pi | i = 1,2,3,….…,n} in a point cloud.
The network aims to learn the relationships and features from these points.
It learns the spatial encoding of each point in the cloud (Pi=0 to Pi=n), and then aggregates all individual encoding(point features) to a global point representation of the point cloud.
It can directly process unordered point clouds, i.e. it doesn’t transform the unstructured data to regular 3D voxel grids(structured data format) before feeding it to the input layer.
Unordered: The network needs to handle this randomness by making certain adjustments in its calculations.
It also needs to see the arrangement of points doesn’t change even if the whole set is shifted or rotated.
The model is Permutation Invariance: The property of a model that remains unchanged even if the order of its input elements is altered. This allows PointNet to process point clouds regardless of the order of the input points. This is achieved with the help of Input Transform and Shared Multi-Layer Perceptron networks.
It utilizes a symmetric function, max pooling to aggregate information across all points.
This network can be used for Classification and Segmentation tasks.

Qualitative results for Semantic Segmentation: The top row is an input point cloud with color. The bottom row is the output semantic segmentation result (on points) displayed in the same camera viewpoint as the input..

The network can handle variations and noise in the input.

PointNet VS VoxNet on incomplete input data. The metric is overall classification accuracy on ModelNet40 test set. Note that VoxNet is using 12 viewpoints averaging while PointNet is using only one view of the point cloud. **PointNet presents much stronger robustness to missing points.**

Task and Purpose:

1. Handling Unordered Point Clouds:

Problem: Traditional neural networks typically operate on grid-like structures, and struggle with the inherent unordered nature of point clouds.

Solution: PointNet is designed to handle unordered point clouds directly, without relying on a predefined order. This is crucial for tasks where the arrangement of points in the input is arbitrary, such as in 3D object recognition.

2. Permutation Invariance:

Problem: Point cloud data can be presented in various orders, leading to different neural network outputs without mechanisms for permutation invariance.

Solution: PointNet achieves permutation invariance through shared multilayer perceptrons (MLPs), ensuring that the network’s predictions remain consistent regardless of the order in which points are presented.

It needs to see the arrangement of points doesn’t change even if the whole set is shifted or rotated. (Uses Input Transform Net (T-Net))

This allows PointNet to process point clouds regardless of the order of the input points.

PointNet handles **unstructured** representation and also maintains the **permutation invariance.**

3. Local and Global Features:

Problem: Extracting both local and global features from point clouds is challenging.

Solution: PointNet captures local features from individual points using shared MLPs and aggregates them to obtain a global feature representation. This hierarchical approach helps the network understand both local and global structures within the point cloud.

It utilizes a symmetric function, max pooling to aggregate information across all points.

PointNet Architecture Workflow

Layer 1. Input Transform Net:

The network learns a transformation matrix that can be applied to the input points to achieve invariance to rotations and translations. (T-Net)
The transformed points are obtained by multiplying the original points by the learned transformation matrix.
This step helps the network to be robust to different orientations of the input.

Layer 2. Shared Multi-layer Perceptrons (MLPs):

Shared MLP is shared among multiple inputs or branches in a model and is typically used to extract features from the inputs and generate intermediate representations, that can be fed into other parts of the model.
It captures the local features from each point individually.
It consists of a sequence of fully connected layers and the activation function(ReLU).
The same set of weights is used for each point in the input set, promoting weight sharing and reducing the number of parameters, hence computationally inexpensive.
Because the same weights are used for each point, the network is inherently invariant to the order.

Layer 3. Feature Transform Net:

Similar to the Input Transform Network, the feature transform network learns another transformation matrix.
This matrix is applied to the features obtained from the shared MLP for each point.
This transformation helps the network to learn local patterns invariant to different scales.

Layer 4. Shared Multi-Layer Perceptron (Again):

Another Shared MLP is applied to the points after the feature transformation.
This layer further refines the features by incorporating global information.

Layer 5. Max Pooling:

PointNet uses max pooling across all points to obtain a global feature vector.
This step allows the network to aggregate information from all points and create a representation of the entire point cloud.

Layer 6. Fully Connected Layers:

It is used for final Classification or Segmentation, these layers interpret the global information and produce the final output.

Here is the link to the research paper on PointNet architecture:

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation.

Hope this blog helped you get a better understanding of PointNet Architecture. If you like it support it with a clap. Happy Learning… :)