Automating Railway Asset Detection using LiDAR and Deep Learning

Published in

GeoAI

11 min readSep 17, 2019

An overview of extracting railway assets from 3D point clouds derived from LiDAR using ArcGIS, the ArcGIS API for Python and deep learning model.

Outline:

1. Asset Inventory Management in Railway
2. Deep Learning from 2D to 3D Domain
3. Study Area and Data Preparation
4. Train PointNet for Classification
5. Inference Workflow
6. Conclusion and Future Works
7. Acknowledgment and References

1. Asset Inventory Management in Railway

Asset inventory management for railway (e.g., railway, milepost, crossing, signal, switch, etc.) requires comprehensive surveying and modern asset management. This task is going to be much harder when it comes to the new railway. GIS plays a critical role to make asset management more effective by providing a platform for storage and management of location data across time as well as serving as a decision support tool for better decision‐making. Besides inventory management of existing assets, large railway companies are also interested in 1) updating the existing inventory with new assets that are placed by US Department of Transportation and 2) dropping the old assets from inventory that are removed from the railway by US Department of Transportation. This is a very difficult task as large railway companies have lots of long railways (~over 10K mile) across the entire US.

During the last few years, most of the large railway companies started to capture LiDAR data along the railway. LiDAR provides fast, reliable, highly dense and highly accurate data for assets mapping. LiDAR data are usually captured in discrete patches and later registered to get a complete 3D point cloud of the railway. The ability to collect RGB values from 360 Ladybug along with intensity makes this technology most relevant for training deep learning models. Most of the large railway companies currently have a large number of technicians that go through 3D point clouds to detect their desired assets manually and label them as a point feature class using GIS software such as ArcGIS Pro. However, this operation is very tedious and time-consuming at a large scale (e.g., entire the US). GeoAI team in ESRI was asked to automate feature extraction (e.g., signal, switch, crossing, milepost) from 3D point cloud (Figure 1).

Figure 1. Examples of Assets (Signals, Crossings, Mileposts, and Switches) in 3D Point Clouds as well as Imagery

2. Deep Learning from 2D to 3D Domain

There are lots of studies that applied deep learning in 2D domain for a variety of applications such as classification, segmentation, change detection, localization, recognition and scene understanding with notable successes. Most of these networks use convolution neural networks (CNNs) that enable them to progressively learn discriminative hierarchical features of the large training data. While most works in deep learning focus on regular input representations like sequences (in speech and language processing), images and volumes (video or 3D data), applying deep learning on the 3D space has not been as effective as 2D even with the latest advances in 3D sensing technologies and the increased availability of affordable 3D data. This can be due to a variety of reasons such as lack of a large amount of 3D labeled data, lack of complex deep learning models in 3D space, the variation of the structure and the geometric properties of 3D data, and complex task of learning from 3D geometry.

Application of deep learning models in 3D domain can fall in four various groups:

Multi-View CNNs that have tried to render 3D point cloud into 2D images and then apply 2D CNNs to classify them. Although such methods have achieved good performance on shape classification, it’s difficult to extend them to 3D tasks such as point classification or point segmentation.
Volumetric CNNs are the pioneers applying 3D CNNs on voxelized shapes. However, volumetric representation is constrained by its resolution due to data sparsity and the computation cost of 3D convolution.
Spectral CNNs use spectral CNNs on meshes. However, these methods are currently constrained on manifold meshes such as organic objects and it’s not obvious how to extend them to non-isometric shapes such as furniture.
Feature-based DNNs convert the 3D data into a vector, by extracting traditional shape features and then use a fully connected net to classify the shape. However, they are constrained by the representation power of the features extracted.

Here, I used PointNet as a deep learning model to detect railway assets from 3D point clouds. PointNet could be considered a feature-based DNN, but the features are learned implicitly by the network through a series of affine transformations and feature-wise fully connected layers.

3. Study Area and Data Preparation

I had access to the centerline of the railway in three cities in of the States in the United States (confidential). I also received point clouds along the three railways and the location of the four assets: mileposts, crossings, signals, and switches in a point feature format.

As the point clouds were not labeled, I could not use PointNet for segmentation. Thus, I had to discover a way to implement PointNet for classification. Finding the best way to process and prepare training data for PointNet as a classification model was a huge obstacle. As each railway contains over billion points, I had to filter the extra points out and keep only critical 3D points for the training run.

I followed various steps to prepare data for the training. First, I had to clip point clouds both along and perpendicular to the railway (3D boxes). As desired assets fall within 15m buffer of railway center, I had to generate 15m buffer along and perpendicular to the railway (15m×15m box). To achieve this, I used “Generate Points Along Lines” geoprocessing tool with a distance interval of 15m in ArcGIS Pro to create points along the railway [1]. I then used the “Split Line at Point” function in ArcGIS Pro to split the railway using points generated from the “Generate Points Along Lines” tool [2]. The lines along the railway with 15m length are the output of the previous step. I then applied the buffer tool with the 15m distance to these lines to generate 15m×15m boxes (Figure 2) along the railway [3].

Figure 2. 15m×15m Boxes Along the Railway

Second, I prepared positive and negative samples for the training run. Positive samples refer to the boxes that contain at least one or more of the desired assets; however, negative samples refer to the boxes that do not contain any of the desired assets. To generate positive and negative samples, I performed a spatial join between the target feature, which is the layer with 15m×15m boxes (Figure 2), and the joined feature which is the labeled assets layer in point format [4]. As the number of the negative samples were much larger (over 100 times) than positive samples, I sampled 10% of the total negative samples (Figure 3).

Figure 3. Positive and Negative Samples Along the Railway

Third, I used the “Extract LAS” tool to keep point clouds (Figure 4) within the generated 15m×15m boxes (Figure 3) with positive and negative samples [5]. As the attribute table of 15m×15m boxes contain the class of each asset falls in each box, I used it (Figure 4) to keep track of positive and negative samples while exporting point clouds to separate LAS file.

Figure 4. Examples of the Output of the “Extract LAS“ Function for Positive Samples within 15m×15m Boxes

Fourth, as my desired assets are off the ground, to filter out more unnecessary points within each box, I used “Classify LAS Ground” tool to classify point clouds to the ground and non-ground points [6]. I then dropped ground points within each 15m×15m box from further analysis (Figure 5).

Figure 5. Examples of the Ground and Non-Ground Point Clouds in each 15m×15m Box

Fifth, I then used the “Classify LAS Noise” tool in ArcGIS Pro to drop anomalous or noises from further analysis [7]. To drop further noises, I also applied Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. DBSCAN finds clusters of high-density point features within surrounding noise based on their spatial distribution [8]. DBSCAN is marking as outliers’ points that lie alone in low-density regions whose nearest neighbors are too far away (Figure 6).

Figure 6. Example of the 15m×15m Box Before and After Dropping Noises using “Classify LAS Noise” Function and DBSCAN. The Image on the Left is Color-Coded based on Elevation Values. The Image is Color-Coded based on DBSCAN Classes

After dropping ground points and noises from further analysis, each 15m×15m box still has points vary from 50K to 100K. It's recommended using 4096 points in each box for training. Finally, I developed a spatial random sampling method to go over each box and keeps only 4096 point clouds in each box. This method initially converts each box to 3D voxels with 1m length along X, Y, and Z direction. It then randomly samples from each voxel based on the density of the point clouds within each voxel. With keeping the necessary points in each box, its time for the training run.

4. Train PointNet for Classification

I used PointNet as a deep learning model for classification. PointNet directly takes point clouds as input and outputs either class labels for the entire input or per point segment/part labels for each point of the input (Figure 7). PointNet processes each point, which is represented by its three coordinates (X, Y, Z) or more attributes such as intensity, RGB and etc. First, PointNet is invariant to permutations of point clouds due to using a single symmetric function called max pooling. Second, PointNet is invariant to transformations of the point clouds due to using affine transformations which are applied to each point independently. PointNet effectively selects informative points from the point clouds and encode the reason for their selection. The fully connected layers of the network aggregate these learned optimal values into the global descriptors for classification or concatenate global descriptors with local descriptors to predict the class of each point.

Figure 7. The Architecture of PointNet for Classification and Segmentation [9]

After data processing, I extracted X, Y, Z coordinates and intensity associated with point cloud in each box for training. I then calculated the range of Z for each point in each box to use for training instead of absolute Z. I also shuffled the points in each box as well as the order of the boxes before training the model. Table 1 summarized the total number of boxes for the training and the testing run. I used two railways for the training run and left out another railway for the testing run.

Table 1. Comparing the Number of 15m×15m Boxes for Each Class in the Training and Testing Run

Deep learning models, in general, require large dataset for training. As the number of samples was not enough for positive classes (e.g., signal, milepost, crossing), I used data augmentation to increase the number of samples for training data. I used two common methods of data augmentation in 3D domain to increase the number of training data for the rare positive classes (e.g., signal, milepost, crossing, switch): 1) rotation: As LiDAR sensor mounts in front of the train, rotation along X and Y axis are not that significant. Thus, each box randomly rotated along X and Y axis for less than 5 degrees. However, for Z-axis situation is different. As the train can travel on the same railway back and fort, a similar object can be viewed from multiple views. Thus, each box randomly rotated along Z-axis between -180 to 180 degrees, and 2) jittering: Jittering can be explained as the process of damaging the point clouds or adding noise to the point data. While the point density of similar object might vary from scene to scene due to the distance of the object from the LiDAR sensor, jittering enables the model to learn only critical points in the training run.

I then merged the newly generated boxes from data augmentation with existing boxes. I used two railways for the training run and left out another railway for the testing run. The training data are coming from two railways while the testing data are coming from other railways that left out from the training run. I trained PointNet for over 200 cycles and saved the best model that produced the best precision and recall for entire classes as well as each class, separately. Overall, PointNet performed pretty well with producing over 70% precision and recall for most of the classes except switches (Table 2). PointNet failed to detect switches and misclassified them as a negative class and vice versa. This is due to the fact that the width, length, and height of switches are small (less than 20cm across dimensions).

Table 2. Accuracy Metrics such as Precision and Recall of PointNet Calculated in Testing Run

Figure 8 shows some of the extracted assets inside the boxes along the railway in the testing run. I have not labeled negative classes.

Figure 8. Black Dots Show the Location of Assets in Point Clouds. Yellow Colors Show the Boundary of 15m×15m Boxes. The Image on the Left has Signal and Switch. The Image on the Right has Crossing. Both Images Color-Coded based on Intensity

5. Inference Workflow

After training, we need a workflow for the inference (Figure 9). This workflow is responsible to run against any new dataset using the existing trained model to detect assets and export them as a point feature class on ArcGIS Online [10]. The workflow that I am currently working on has four main steps: 1) web user-interface where user can drop point clouds in LAS format as well as railway centerline in shapefile format. As data flows to an S3 bucket, AWS Lambda which is an event-driven, serverless computing platform runs Data Pipeline in response and automatically manages the computing resources required by that code, 2) Data Pipeline contains most of the data processing steps that I explained through the blog such as clipping, random sampling, noise reduction and etc., 3) Model Pipeline which can be served as a Geoprocessing Tool [11] to run PointNet and detect asset in each box and 4) Output component which converts the outcome of model, point in centroid of box and its associated class, to feature class and push it to Portal on ArcGIS Online for user.

6. Conclusion and Future Works

In this article, I walked you through the end to end workflow that developed to extract features from 3d point clouds. The workflow is only using trackline, 3D point clouds along the railway as well as the location of assets in a point feature format to train the deep learning model. GIS played a key role here to map the location of assets and prepare the location data for training. Our future works focus on a diverse range of topics such as comparing PointNet with PointNet++ and PointCNN for both classification and segmentation railway applications.

7. Acknowledgment and References

I want to say thanks to Daniel Wilson, Omar Maher and Khalid Duri who reviewed this blog.

1] https://pro.arcgis.com/en/pro-app/tool-reference/data-management/generate-points-along-lines.htm
2] http://desktop.arcgis.com/en/arcmap/10.3/tools/data-management-toolbox/split-line-at-point.htm
3] http://desktop.arcgis.com/en/arcmap/10.3/tools/analysis-toolbox/buffer.htm
4] https://pro.arcgis.com/en/pro-app/tool-reference/analysis/spatial-join.htm
5] https://pro.arcgis.com/en/pro-app/tool-reference/3d-analyst/extract-las.htm
6] https://pro.arcgis.com/en/pro-app/tool-reference/3d-analyst/classify-las-ground.htm
7] https://pro.arcgis.com/en/pro-app/tool-reference/3d-analyst/classify-las-noise.htm
8] https://pro.arcgis.com/en/pro-app/tool-reference/spatial-statistics/densitybasedclustering.htm
9] https://arxiv.org/abs/1612.00593
10] https://www.arcgis.com/index.html
11] https://pro.arcgis.com/en/pro-app/help/analysis/geoprocessing/basics/what-is-geoprocessing-.htm

Automating Railway Asset Detection using LiDAR and Deep Learning

Written by Amin Tayyebi