Analysing an Aerial LiDAR Point Cloud dataset

LiDAR data is an integral part of any map making company. Captured in-house or procured from a vendor, map making companies rely on LiDAR data to build high definition maps. LiDAR data can be captured using a LiDAR sensor mounted on a vehicle on the ground, usually referred as Terrestrial LiDAR capture and the other ways it can be captured is through Drones, UAVs and Low flying airplanes or copters, this is usually referred as an Aerial LiDAR capture. In this blog post we will have a look at one of the open source Aerial LiDAR Point Cloud dataset and play around with it to see what all details we can infer out of it.

The dataset that we will be looking at as part of this article is obtained from the Dutch Government’s PDOK (Publieke Dienstverlening Op de Kaart) platform. PDOK offers public and private Geo-Datasets for open use. These geo-datasets are supplied by various departments and ministries of the government and public administrations. They are therefore guaranteed to be up-to-date and reliable. The raw LiDAR data used in this article can be downloaded from here, it’s a 2GB file with X,Y,Z coordinates in EPSG:7415 coordinate system. However, I have taken a crop of this data using CloudCompare and have subsampled it to retain 1 out of every 8 points in this point cloud so that it can be easily loaded for a quick analysis. Also the raw data was in LAZ (LAS Zip —a compressed LAS format which is lossless) which was converted to LAS after cropping and subsampling. LAZ to LAS conversion can be done using CloudCompare or LASTools. The subsampled and cropped LAS file (121 MB) can be downloaded from here. If you load this file in CloudCompare and select the Scalar Fields=Number of Returns (refer screenshot below), it will look like this.

A LiDAR Point cloud coloured by Number of Returns in CloudCompare

One can observe that there are 9 scalar field values using which this point cloud can be coloured. These 9 fields are the fields present in the LAS file for every point along with their X, Y and Z coordinates. We can also confirm this by saving this LAS file as PLY file using CloudCompare and loading this in a text editor.

Field definition metadata as observed by loading PLY file into a text editor.

Now, we will try to understand what each of these fields are and see what all information they can tell us about the area that was captured.

The Coordinates: The x, y and z values are coordinates for each point in the EPSG:7415 coordinate system as described earlier. Usually, when a LiDAR sensor captures the points, the coordinates are relative to the position of the sensor, and it requires some post processing to do geo-registration that converts these coordinates into a geospatial coordinate system like EPSG:7415.

Point Source ID: According to LAS 1.2 Specification by ASPRS, a Point Source ID corresponds to another value called File Source ID which is nothing but a Flight line number in our case, so it identifies from which flight this point was captured. Usually, the drones have to do multiple flights to capture an area, this field provides a reference to connect the point captured with the identifier.

User Data: This is an additional extrinsic field which can be populated with any additional data as desired by the end user.

Scan Angle Rank: is the angle at which the laser beam was shot from the sensor including the roll of the aircraft. The value ranges between -90 to +90 with decimal values rounded to nearest integers.

FlightLine Edge: The drones usually follow a straight path called scan line while making a capture. This is a flag that indicates whether the point captured is at the end of the scan line or not which indicates that after this point the drone changed the direction and moved to another scan line.

Number of Returns: The laser beam emitted from the LiDAR sensor, has a fixed diameter (usually 10 cms) and it also has a tendency to diverge causing widening of the diameter of the beam. It is highly likely that only a part of this diameter comes under a single object. A classic case is trees with large crowns like for eg. deciduous trees where a single beam touches multiple levels of the tree structure and also the ground (refer the diagram below). This phenomenon results into multiple returns being registered for a single laser pulse. This field contains the count of returns being registered for a given pulse. For eg., this field will have value 5 in the following case for all the points registered by this pulse that hit the tree.

Laser Beam Divergence causing multiple returns at different levels of the Tree structure and the ground.

This is a very valuable information because if you think in the other direction, all the building roof tops, side walls, the ground surface will only have a single return being registered as they are all flat surfaces. Hence just by filtering this data with number of returns we can easily identify trees and vegetation from the rest of the features. So if we colour this point cloud by selecting the “Scalar Field=Number of Returns” in CloudCompare, we can easily distinguish trees vegetation from the rest of the features

LiDAR point cloud coloured by Number of Returns, highlighting Trees and Vegetation

If we fit a plane to all the points using RANSAC(Random Sample Consensus) it will give us a plane where the points falling within the plane are the points belonging to the Ground Surface and the points falling outside the plane consists of everything else except Ground Surface. Once the points belonging to the Ground Surface are removed, It is possible to further segregate low level vegetation and large trees with big crowns by filtering the remaining points based on number of returns. For eg. large deciduous trees will register more than 3 returns and low level vegetation will usually register 2 returns, and in between are the medium sized trees and plants. Check out the video below to see how it looks once the above mentioned steps are done.

LiDAR points filtered by Number of Returns>2 after removing Ground points using RANSAC

Return Number: This field indicates the details about the return using which each point was registered out of the number of returns registered for a LiDAR pulse. For eg. if there were 5 returns registered for a beam that hit the tree, the top most part of the crown of the tree will have return number 1 (First out of the 5 returns) and the ground will have return number 5 (last out of 5 returns). This field is also very important. If we remove the ground points using RANSAC and just filter all the points with “return number = 1” it will give us all Buildings roof tops, side walls, top of tree crowns, elevated roads and bridges.

LiDAR points filtered by First Return after removing Ground points using RANSAC

If we use this field in combination with the Number of Returns field we can easily separate out the more features features like building from the rest of the features. Building rooftops and side walls will have Number of Returns =1 and also Return Number =1 and with a certain height higher than a mean ground value, if we use this filter, it is possible to identify points that are part of buildings.

Buildings identified based on a combination of Return Number, Number of Returns and height filter on Z value

Time: is the GPS Time at which the point was acquired.

Intensity: is the magnitude of the pulse return registered by the sensor. A highly reflective structure will have high intensity registered, while a less reflective structure will have lesser intensity registered. Shallow Water bodies with water and visible ground inside the water are reflective but if you go slightly deeper the water bodies tend to absorb the laser beam and do not reflect back causing the LiDAR dropouts. Due to this the water bodies are the places where no points are registered. This information can also help classifying bridges, for eg. we can look for the structures where no points were registered on their sides.

LiDAR point cloud with black part depicting water bodies and the red structure is the bridge, green grains depict the shallow water with lesser registered intensity.

Classification: This field stores the information about the already classified data. The classification can be done as a post processing step or through manual annotation. Manual annotation is done in case the dataset is intended to be used as a training data to train point cloud segmentation models. However, If the classification is not available, a default value of 0 is used which indicates that the point in the point cloud was never classified. According to LAS 1.2 spec, this field can hold values from 0 to 31, where some of the prominent values correspond to classes that we have just discussed about how to identify them. For eg, according the the spec, a value of 2 corresponds to Ground, 3, 4 and 5 corresponds to Low, Medium and High Vegetation respectively, 6 corresponds to Buildings and 9 corresponds to water bodies.

Conclusion

To conclude, Aerial LiDAR data tells us a lot of things about what is on the ground, even simple filtering can yield us very good details. We can identify most of the things like Ground Points(via RANSAC), Vegetation Points (via Number of Returns), Buildings (via a combination of Return Number and No. of Returns), Water Body and Bridges (via intensity).

--

--

Abhilshit Soni
Machine Learning & AI in Automated Map Making

Principal Data Scientist at HERE. Mostly working on detecting and extracting map features from Aerial, Satellite and Terrestrial Imagery and LiDAR captures.