Introducing KPConv

Kernel Point Convolution for Point Clouds

11 min readMay 1, 2023

This post presents KPConv: Flexible and Deformable Convolution for Point Clouds. KPConv is inspired from image convolution, but instead of discrete image kernels with a fixed position, a set of continuous kernel points is used to define the area of convolution. KPConv can be used as a basic building block for deep 3D neural network architectures that can consume entire point clouds for classification and segmentation tasks.

Introduction

A Point Cloud is a collection of points in 3D space where each one is defined by a set of Cartesian coordinates (x, y, z). Point clouds are sparse and unordered which differentiates them from the grid-like structure of images.

Each point in a point cloud is spatially localized by it’s coordinates, which means that we can identify similar points by looking at the neighbors of a point at (x, y, z). We could add some offsets (x ± r, y ± r, z ± r) which essentially makes a sphere or ball around the original point and where it’s nearby neighbors contained in the ball. Intuitively we could perform some operation within the Ball that will result in some lower dimensional structure that is representative of the original structure. This notion is essential to building a convolutional network since this allows us to propagate and downsample the underlying structure of the point cloud through the network [1, 2].

Each point can have associated features such as intensity or color. For example, a LiDAR can generate point clouds of an object with color plus an intensity that is characteristic of the reflectivity of the object [3]. Utilizing these features in a Neural Network Architecture in addition to the structural features would be a major benefit, and KPConv provides a way to do this. Figure 1 shows a Point Cloud obtained from a drone.

Figure 1. Point Cloud from drone based LiDAR over Fountain Creek Colorado. Source.

Motivation

Intuitively, a point cloud is an approximation of some real 3D shape and therefore has structure that represents a real object. The structure of a point cloud along with it’s features can be used to learn about the object(s) that they represent.

We may want to ask, what exactly do we want to learn? This depends on the application, but typically we may want to know what the object is (classification), what sub-components make up the object (semantic segmentation).

In geology, LiDAR generated point clouds are used to learn more about the surface of the earth [4]. In autonomous driving, LiDAR is a useful sensor that helps facilitate 3D object detection and in some cases segmentation. It is not only useful to know that an object is present, information on the type of object (car vs bike vs pedestrian) can impact how the vehicle reacts to it’s presence. See figure 2 for an example of classification.

Figure 2. 3D object classification for autonomous driving. Source.

KPConv Overview

An Informal Introduction

Before we dive into the details, let’s do a quick overview of KPConv. The purpose of this section is just to introduce the main concepts at a high level, so that we can build an in depth understanding when we formally introduce KPConv.

For images, the convolution area is defined by a discrete kernel with strictly defined kernel pixels. KPConv on the other hand, operates in continuous 3D space with points contained within a radius, these points are called kernel points.

A 2D example is shown in figure 3, where the kernel points are aligned in specific locations inside of a circle. This is called a rigid kernel, we can also define a deformable kernel by learning shifts for each kernel point which allows the kernel to conform to the local structure.

Figure 3. Rigid and Deformable Kernels. Source: Author.

For now let’s focus on the rigid kernel. Notice how there is a single point in the center of the rigid kernel, KPConv actually centers the kernel on a single point x before performing the operation. The operation is only defined on points that fall inside the radius of the kernel, these points are known as neighbors.

The reason for using a radius to define neighbors as opposed to K Nearest Neighbors (KNN), is that a radius neighborhood is more resilient to varying sampling densities. For example KNN would contain points too far away for sparse regions and would only contain points too close for dense regions. With a radius, we get a more consistent representation of the local structures.

Each kernel point has an associated weight matrix which enables a feature mapping from Dᵢₙ to Dₒᵤₜ, where Dᵢₙ is the number of features for each input point. This mapping enables the both the features and the 3D structure to be used. An example in 2D is shown below, where the weights give us filter values, in a similar manner to image convolution.

In this example, we can see that the kernel points are able to map the neighbors to a single point in the output. Now that we have a general idea of what’s going on, let’s take a closer look.

For clarification the terms neighbors and points always refer to point cloud points and kernel points will be used to explicitly refer to kernel points.

KPConv

A Formal Introduction

KPConv is defined as a point convolution of features F with a kernel g at point x

The neighbors Nₓ are formally defined as all points xᵢ centered on x that fall within the radius r

We will now refer to the neighbors as yᵢ

We define the domain of the kernel function g, as a ball with radius r

The kernel points are constrained to only exist within the ball, we and define them along with their weight matrices as

Where K is the maximum defined number of kernel points. And now we define the kernel g as a function of any neighbor yᵢ

Where h is the linear correlation between the neighbor yᵢ and each kernel point xₖ

Linear Correlation

This correlation is very important, but why?

We have a variable number of neighbors at any given point, so there is no 1-to-1 correspondence with neighbors to kernel points. This makes the task of associating neighbors with kernel points non-trivial and inherently difficult. To alleviate the difficulty, a correlation measure is used to quantify the association of each neighbor and kernel point. This association is dictated with regard to local structures, that is if a kernel point is spatially close to a neighbor, then they will be highly correlated, the inverse is also true and if we look at the linear correlation function h, far away neighbors have a correlation of 0. In essence, the correlation is just a spatial correlation that relates spatial positions of kernel points to neighbors. This enables all neighbors and kernel points to interact with each other, with the spatial correlation determining the strength of the interactions.

All neighbors and kernel points interact with each other, and the spatial correlation determines the strength of these interactions.

Weight Matrix

After the correlation, we multiply by the learnable weight matrix or kernel weight Wₖ. In addition to the correlation, the weight matrix essentially adds another realm of influence that can change how each respective kernel point xₖ can have with each neighbor yᵢ.

Putting the pieces together

To recap, we have seen that the spatial correlation can measure how structurally similar the neighbors are to the kernel points. We have now seen how the weight matrix adds a new realm of influence to the KPConv operation which increases the expressiveness of the operation via learnable parameters.

Now let’s view a 2D example for clarification.

Figure 5. 2D example of a KPConv operation. Source.

Here we have a kernel centered on point x, and we consider an operation on neighbor yᵢ, where each neighbor has Dᵢₙ features. We correlate this yᵢ with all kernel points and then multiply all the correlation values with associated weight matrices. These products are then summed together which give us a single new point with Dₒᵤₜ features.

We can see from the expanded KPConv definition below that the features are multiplied by the sum of all weight matrices scaled by the correlation strength h. This is allows us to scale the features from Dᵢₙ to Dₒᵤₜ.

Expanded KPConv Definition, where terms have been arranged to match dimensions.

Since the correlation applies a weighting to each kernel weight matrix, it in turn dictates how the features are propagated through the network. For example, if a kernel point has a low correlation, then it’s corresponding weight matrix will have little impact on the output features Dₒᵤₜ.

Rigid and Deformable Kernels

The kernel point positions determine how the features are propagated through the network via convolution. This means that their positions are important to the convolution operation and the overall network performance.

Rigid Kernels

Up to now, we have only talked about rigid kernels, where each point is fixed in a predefined position in the domain defined by a ball. How can choose K kernel point positions that will optimize performance?

It turns out that these positions can be found by solving an optimization problem where each point applies a repulsive force on each other, and all points are constrained to stay within the ball with an attractive force.

One point is also constrained to be in the middle of the ball. The points are randomly initialized and the solution is found by minimizing the total energy with gradient descent.

This provides points that are regularly placed within the ball, as shown in figure 6.

Figure 6. Rigid Kernel Point Positions. Source.

The surrounding points can then be re-scaled to an average radius of 1.5σ which provides a small overlap between each kernel points area of influence and overall space coverage of the kernel.

Deformable Kernels

Rigid Kernels have demonstrated efficient and effective performance on segmentation and classification tasks, but it turns out that we can increase the capacity by learning optimal positions for each kernel point. This is known as deformable KP Convolution. Intuitively we would like the kernels to adapt to the local structures of each convolution location.

Since the kernel g is differentiable with respect to the kernel points, we can actually learn the kernel points as parameters. In deformable KP Convolution, we start with a rigid kernel and learn K shifts Δ(x) for each convolution location x.

Where the deformable kernel is defined as:

The shifts (or offsets) Δ(x) are the output of the rigid KPConv mapping of Dᵢₙ features to 3K offsets (one for each dimension). During training, the rigid kernel learns the shifts Δ(x) via back propagation and the deformable kernel generates the new features.

Figure 7. 2D example of Deformable Kernel Point Convolution. Source.

So it is the kernel point shifts that are actually learned.

In order to help prevent the shifts from escaping the kernel region, the learning rate of the kernel point shifts Δ(x) is set to 0.1 times the global learning rate. However this alone is actually ineffective and leads to poor performance. To overcome this a special regularization is used to constrain the deformable kernel points to the kernel region. The regularization consists of a fitting term and a repulsive term.

The fitting term ensures that each kernel point maintains a sufficiently close distance to the input neighbors. The repulsive term ensures that the kernel points do not collapse on each other and to ensures that their areas of influence do not overlap (i.e. we would like a relatively uniform coverage of kernel points).

It can be shown that deformable kernels are able to conform to the local geometry of the input point cloud.

Figure 8. Difference Between Rigid (left) and Deformable Kernels with no regularization (middle) and regularization (right). Source.

We can see in figure 8, that the rigid kernel on the left does not conform to the local geometry. In the middle, the Deformable kernel with no regularization has many points that are scattered far away from the local gemotery, these are called lost points. On the right, we can see that the regularization strategy not only prevents lost points it also allows the kernel to conform to the local geometry which maximizes the number of active kernel points. Where active refers to non-zero spatial correlation values.

Conclusion

In this post we have learned about KPConv, which is a novel convolution designed for point clouds. It utilizes a kernel area defined by a radius to achieve high performance in regions of varying density. KPConv uses a simple linear spatial correlation that enables it to work with a variable number of local input points. It is also able to learn weight matrices for each kernel point that increase the expressiveness of KPConv in a way that is analogous to image based Convolutional Neural Networks.

KPConv has two flavors that depend on kernel type: rigid kernel and deformable kernel. Rigid kernels have predefined structure and are computationally efficient. Deformable Kernels are able to conform to local structures and tend to exhibit higher performance on more complex tasks.

KPConv is a modular operation that is able to be utilized to create deep Neural Network Architectures that can directly consume point clouds with minimal data preprocessing. These architectures can be catered toward both point cloud classification and segmentation.

References

[1] Thomas, H., Qi, C. R., Deschaud, J.-E., Marcotegui, B., Goulette, F., & Guibas, L. (2019). KPCONV: Flexible and deformable convolution for point clouds. 2019 IEEE/CVF International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2019.00651

[2] Atzmon, M., Maron, H., & Lipman, Y. (2018). Point convolutional neural networks by extension operators. ACM Transactions on Graphics, 37(4), 1–12. https://doi.org/10.1145/3197517.3201301

[3] MicroImages, Inc. (2010, October). Lidar: Select points by class, return type, intensity — microimages, Inc.. www.microimages.com. Retrieved April 30, 2023, from https://www.microimages.com/documentation/TechGuides/77LASptSelect.pdf

[4] Admin. (n.d.). Lidar and Radar Information. lidarradar.com. Retrieved April 30, 2023, from https://lidarradar.com/apps/lidar-application-in-geology

Thanks for Reading! If you found this useful please consider clapping 👏