PointCNN: replacing 50,000 man hours with AI
Update Dec. 2020: We published a PointCNN model pretrained on the data used in this experiment. This time, the PointCNN implementation comes from ArcGIS API for Python: download
Today we are going to talk about the experiment we did at Esri together with our Australian business partner, AAM Group, which specializes in the collection, analysis, presentation and delivery of geospatial information. In particular, collecting airborne LiDAR point clouds for electric utility companies to detect the power lines and vegetation growing or any other easement encroachment into the power lines’ safety corridor.
Vegetation and encroachment monitoring is a critical task which needs to be performed on a regular basis to ensure the safety of both transmission and distribution networks. Utility companies operate tens of thousands of miles of power lines, yet missing a single tree canopy growing too close can lead to a massive wildfire or a power outage affecting thousands of consumers.
To minimize the risk of such events, typically, an annual survey is performed on the entire grid by flying low altitude manned airplanes or drones equipped with LiDAR sensors. Once the point cloud is collected, the power line points are manually labeled inside a GIS / CAD system, then any intrusions into the safety zone are automatically detected and placed into a work-order system for field teams to address.
Manual labeling of the wires in raw point clouds is an extremely labor intensive process: just for one of its customers, AAM Group invests about 50,000 man hours a year to label the points which belong to overhead conductors.
Can a Deep Neural Network help?
Previously, we have experimented with a deep neural network called PointCNN which allows for efficient semantic segmentation (automatic assignment of classes like Ground, Water, Building, Vegetation etc. to each point) of raw point clouds. Back then, we trained a PointCNN model to label building points and received some quite impressive results which, in most cases, outperformed traditional deterministic algorithms.
But the buildings are much simpler to detect compared to overhead wires: if the former, especially in the urban areas, are contained inside relatively balanced sets, the latter are represented by disappearingly low number of points, e.g. only 12,500 out 3.6M points as in the LAS file below.
Another challenge — a building, due to its size, is easier to discern from the surrounding noise e.g. touching or overhanging tree canopies, adjacent bushes, street furniture, etc. The overhead conductors, on the other hand, are represented by non-planar zero-area point neighborhoods and are much harder to discriminate from nearby buildings, trees, utility poles.
Spoiler Alert. Given the above, we were conservative in our expectations about PointCNN abilities to learn general rules needed to detect and label overhead power lines. The good news — we were wrong.
The experiment
We took a fairly small subset of a manually labeled point cloud, partially covering an Australian city, containing about 540M points total with average density of ~60 points per square meter. After some preliminary filtering and compression done in ArcGIS Pro, these are the classes we ended up training the model with: 0 — Other, 1 — Wires, 2 — Stay-Wires, 3 — Utility Poles.
We used the TensorFlow-based implementation of the PointCNN architecture and a single NVIDIA Quadro GV100 card with 32GB of VRAM to train and test a model on the above LAS dataset.
Data Prep and Know-hows
The data preparation work, to some degree, is complex when it comes to working with point clouds in general and PointCNN in particular. The framework splits the input points into two sets of 50% overlapping voxels, so the inner points get processed four times in order to probe them in various local neighborhoods.
If your GPU does not have a good amount of VRAM (in our experiments we used a card with 32GB) you may hit the Out-Of-Memory limitations with the default settings. To deal with this, multiple options exist: from reducing the mini-batch size (leads to a slower convergence), to thinning the voxels. The later may be a better alternative when working with larger objects: if voxel’s point count exceeds the pre-configured limit, it will be thinned by the framework through sampling into multiple 100% overlapping voxels. This leads to a trade-off though: large voxels are needed to capture larger objects v.s. smaller objects require a higher point density per voxel.
To capture best of both of these worlds, you really want to use a GPU with the largest VRAM you can get your hands on.
If you do not have a large VRAM, but still need to label objects of significantly different sizes, it may make sense to train different PointCNN models with different voxel sizes and point densities.
Since our primary focus was on the Wire class, we chose a 250 m² cross section for the voxels with 24,576 max points density.
Not that surprisingly, but still worthwhile to mention that we achieved better results when training not just on pure XYZ values, but also adding the Intensity and the Number Of Returns attributes to the set of input features.
Another important fact: tensorboard shows signs of overfitting on the validation loss much earlier than the Recall stops growing, while the Precision dynamics matches the validation loss fluctuations pretty accurately. Therefore, it may make sense to train a bit longer and sacrifice some of the Precision in return for a higher Recall and the overall F1-score.
Results
We received the best Precision value around the minimum of the validation loss, at the Iteration 67,000. The best Recall though, after almost a double number of epochs:
*- Update from August 2019 (more details below). After training on a larger training set, we achieved Precision for Wires of 0.966, and 0.82 for Poles; Recall — 0.981 and 0.775 respectfully.
Below are some side-by-side comparisons from the test set of the ground-truth (left) and PointCNN predictions(right) from the 116,000th iteration.
Future work
- More high-quality training data is always good in the world of Deep Learning, and we are going to continue our experiments on a training set about 20 times larger to make the PointCNN model even more efficient working with overhead conductors.
But even at this point, the results we have achieved, according to Darko Radiceski, CTO of AAM Group, is the state of the art in the industry.
- The location of the overhead wires is usually known to some accuracy and there is no need to train or run inferencing on the entire survey area including patches of land which are known to have no wires at all, especially in rural areas. Operating just within the power line corridors will make training and inferencing faster and more accurate.
- High-rises need to be cut. Having a very tall building in a voxel, after normalization, “squashes” nearby wires into the ground making them harder for the PointCNN to detect. It makes sense to drop any points from the input point cloud which are higher than a certain level above the ground.
August 2019 Update: After we trained PointCNN on 10B+ points, the quality metrics, as expected, grew even higher. Best Recall at iteration 232,758 and best Precision at iteration 687,000:
Another great example of PointCNN scalability: we applied the model trained on Australian wires to an airborne LiDAR point cloud from Utrecht, Netherlands and achieved quite decent results here as well:
Conclusion
Segmentation of point clouds is an important, yet a challenging, often manual, process. PointCNN here is a great productivity tool which will allow to make the infrastructure inspections more cost efficient, reliable and frequent leading to improved safety.