Learning to connect the dots…

Published in

EcoVisionETH

6 min readJul 29, 2021

Many practical 3D sensing systems, like stereo cameras or laser scanners, produce unstructured 3D point clouds. That choice of output format is really just a “smallest common denominator”, the least committing representation that can be reliably generated with low-level signal processing. Most users would prefer a more efficient and more intuitive representation that describes the scanned object’s geometry as a compact collection of geometric primitives, together with their topological relations — Wireframe is just what we want.

Our approach is able to convert a 3D point cloud into a wireframe model using an end-to-end trainable deep network.

What is a wireframe?

A wireframe is a graph representation of an object’s shape, where vertices correspond to corner points that are linked by edges.

Wireframes are a good match for polyhedral structures like mechanical parts, furniture or building interiors. In particular, since wireframes focus on the edge structure, they are best suited for piece-wise smooth objects with few pronounced crease edges (whereas they are less suitable for smooth objects without defined edges or for very rough ones with edges everywhere). Their biggest advantage in many applications is that they are easy to manipulate and edit, automatically or interactively in CAD software, because they make the salient contours and their connectivity explicit. Reconstructed wireframes can drive and help to create 3D CAD models for manufacturing parts, metrology, quality inspection, as well as visualisation, animation, and rendering.

Why is it challenging?

Inferring the wireframe from a noisy point cloud is a challenging task. We can think of the process as a sequence of steps: find the corners, localise them accurately (as they are not contained in the point cloud), and link them with the appropriate edges. However, these steps are intricately correlated. For example, corner detection should “know” about the subsequent edge detection: curvature is affected by noise (as any user of an interest point detector can testify), so to qualify as a corner a 3D location should also be the plausible meeting point of multiple, non-collinear edges.

An end-to-end trainable deep architecture for point cloud to wireframe conversion → PC2WF.

We introduce PC2WF, an end-to-end trainable deep architecture for point cloud to wireframe conversion. It is a conceptually simple yet effective algorithm to produce wireframes.

Illustration of our wireframe modeling architecture

PC2WF is composed of a sequence of feed-forward blocks:

Backbone. PC2WF first extracts a feature vector per 3D point that encodes its geometric context. We use Fully Convolutional Geometric Features [1], which are compact and capture broad spatial context. The backbone operates on sparse tensors [2] and efficiently computes 32-dimensional features in a single, 3D fully-convolutional pass.

Vertex Detector. This block performs binary classification of local patches (neighbourhoods in the point cloud) into those (few) that contain a corner and those that do not. Only patches detected to contain a corner are passed on to the subsequent block. Its loss function during training is binary cross-entropy.
Vertex Localiser. The vertex detector passes on a set of patches predicted to contain a corner. The points of such a patch along with their features form the input for the localisation block. It outputs the location of the vertex within a patch. The prediction is supervised by a regression loss.
Edge Detector. The edge detection block serves to determine which pairs of (predicted) vertices are connected by an edge. During the training stage, there is a problem that the number of positive samples and negatives samples are extremely unbalanced. Then which edge candidates should be selected to feed into this block so that the edge detector block can well learn to predict the existence of an edge? We carefully design the selection mechanism inspired by [3].

We draw input edges for the edge detector from the following sets:

(a) Positive edges between ground truth vertices: This set comprises all true edges in the ground truth wireframe.

(b) Negative edges between ground truth vertices: Two situations are relevant: “spurious edges”, i.e., connections between two ground truth vertices that do not share an edge. And “inaccurate edges” where one of the endpoints is not a ground truth vertex, but not far from one (to cover difficult cases not far from a correct edge).

(c) Positive edges between predicted vertices: Connections between two “correct” predicted vertices that have both been verified to coincide with a ground truth wireframe vertex up to a small threshold.

(d) Negative edges between predicted vertices: These are (i) “wrong links” between “correctly” predicted vertices; and (ii) “hard negatives” where exactly one of the two vertices is close to a ground truth wireframe vertex, to cover “near misses”.

We use the balanced binary cross-entropy loss in the training stage. During inference, we go through the fully connected graph of edge candidates connecting any two (predicted) vertices. Candidates with a too high average distance to the input points (i.e., not lying on any object surface) are discarded. All others are fed into the edge detector for verification.

An end-to-end trainable approach. All stages of the architecture that have trainable parameters support back-propagation, such that the model can be learned end-to-end. We minimise joint loss function with balancing weights:

Example results.

Let’s also have a look at some of our qualitative results.

Left: raw point clouds, right: extracted wireframes.

A visual comparison between the ground truth, our PC2WF, EC-Net [4], and Polyfit [5] is shown below. One can clearly see the advantage of predicting vector edges rather than “edge points”, as they lead to a much sharper result.

In conclusion.

In this post, we have proposed PC2WF, an end-to-end trainable deep architecture to extract a vectorised wireframe from a raw 3D point cloud. The method achieves very promising results for man-made, polyhedra objects, going one step further than low-level corner or edge detectors. We see our method as one further step from robust, but redundant low-level vision to compact, editable 3D geometry.

For the interested reader, you can find all the details in our publication: here.

Liu, Y., D’Aronco, S., Schindler, K., Wegner, J. D.: PC2WF: 3D wireframe reconstruction from raw point clouds, International Conference on Learning Representations (ICLR), 2021

We have also made our source-code publicly available: https://github.com/YujiaLiu76/PC2WF

References

[1] Choy, Christopher, Jaesik Park, and Vladlen Koltun. “Fully convolutional geometric features.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

[2] Choy, Christopher, JunYoung Gwak, and Silvio Savarese. “4d spatio-temporal convnets: Minkowski convolutional neural networks.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.

[3] Zhou, Yichao, Haozhi Qi, and Yi Ma. “End-to-end wireframe parsing.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.

[4] Yu, Lequan, et al. “Ec-net: an edge-aware point set consolidation network.” Proceedings of the European Conference on Computer Vision (ECCV). 2018.

[5] Nan, Liangliang, and Peter Wonka. “Polyfit: Polygonal surface reconstruction from point clouds.” Proceedings of the IEEE International Conference on Computer Vision. 2017.