Estimating Building Heights Using LiDAR Data

Abhineet Gupta, Director of Resilience Research at One Concern

One Concern
One Concern
12 min readSep 11, 2020

--

3D render of building elevations and heights in San Francisco, obtained from LiDAR

LiDAR data can be incredibly powerful to extract elevations of the ground and objects at city scales. At One Concern, we are using LiDAR data to extract ground and building elevations to improve the exposure information that goes into our natural disaster models, to eventually estimate impacts from floods and earthquakes.

With the 3DEP project expected to collect US-wide LiDAR data by 2023, availability of LiDAR data should become easier and a lot more widespread. However, since LiDAR is pointcloud data, it can contain billions of points per city, and it is not straightforward to read and process the data. Although other articles have already been written about how to process LiDAR data to extract building heights, some of the libraries have since been discontinued or superseded. With this article, I intend to provide a more up-to-date set of tools and processes to use LiDAR data to extract building elevations, using San Francisco as an example.

Tools

I’ll be using the following tools —

  • pdal — for reading LiDAR files
  • PostgreSQL with pointcloud — database and extension to store LiDAR pointcloud data
  • docker — to set up tools in a virtual environment (optional)
  • QGIS 3.x — to visualize data (optional)

Data

For this article, I will be using two datasets — LiDAR and OSM.

LiDAR data

LiDAR data for San Francisco region is available from the USGS database (ftp://rockyftp.cr.usgs.gov/vdelivery/Datasets/Staged/NED/LPC/projects/), where it is referred to as the Golden Gate LiDAR project. This data was collected for the San Francisco State University in 2010 and covered 835 square miles in and near San Francisco.

Region of LiDAR data acquisition

You can access some of the other publicly available global LiDAR datasets from links compiled on the LASzip website.

OpenStreetMaps Data

In order to assign LiDAR data to building footprints, I will be using building footprint shapefile (.shp.zip) data for Northern California from OpenStreetMaps (OSM).

Setting up the environment

I used docker to set up my environment in virtual containers. I have provided the setup steps for PDAL and PostgreSQL with pointcloud using docker below. Although I was using docker on MacOS, it is also available for Linux, and Windows, and the setup should remain the same irrespective of your OS. You can also set up the PDAL and PostgreSQL withpointclouddirectly on your OS using the instructions on their respective websites.

If you are using docker, it will be easiest to first create a bridge network for the containers to communicate with each other. The following command creates a bridge network named pdal-net, however you can replace it with your desired name.

docker network create pdal-net

I will use PDAL to read my LiDAR files. PDAL is a GDAL-like library designed to be used for pointcloud data. It is a new replacement for previous libraries like libLAS and LASlib, used for the same purpose. In your terminal, from the current directory that contains the LiDAR data, use the following command to start the pdal docker container in the background with name pdal and attach it to the network pdal-net. The flag -v $(pwd):/scripts mounts your current directory on host system in the /scripts directory of the pdal container.

docker run --network pdal-net -v $(pwd):/scripts --name pdal pdal/pdal:latest

I will use PostgreSQL with pointcloud extension to store and process the LiDAR data. Since the data contains millions of points, storing it in a database makes it easier to process, and the pointcloud extension provides a convenient way to store, access, and work with the data using PostgreSQL queries. From the same directory as above, use the following command to set up and start a database in a docker container called pgpointcloud with the necessary postgis, pointcloud and pointcloud_postgis extensions.

docker run --network pdal-net --name pgpointcloud -e POSTGRES_DB=pointclouds -e POSTGRES_PASSWORD=mysecretpassword -v $(pwd):/mnt -d pgpointcloud/pointcloud:latest

Reading LiDAR data

LiDAR data is generally available as .las or .laz files, where the latter is the compressed version that takes up less disk space. The San Francisco data is available as both .las and .laz files. The steps for reading both types of files in PDAL are the same, as long as the correct LiDAR compression library dependencies (like LASzip) are installed, which they are if you are using the above docker images. LiDAR data is generally collected in uniform grids, and data from each grid is available as a separate .las/.laz file. The San Francisco data is divided into almost 1200 grids, and the compressed files take up almost 38GB disk space.

LiDAR collection grids over San Francisco, with associated file number suffixes.

In PDAL, the processing steps like reading, filtering, and writing a pointcloud file are described in a JSON script called a pipeline. The following pipeline reads in one of the LiDAR files and outputs them as a .csv (comma separated text) file, that you can import into QGIS to visualize.

{
"pipeline":[
{
"type":"readers.las",
"filename":"laz/ARRA-CA_GoldenGate_2010_001019.laz",
"spatialreference":"EPSG:26910"
},
{
"type":"writers.text",
"format":"csv",
"keep_unspecified":"true",
"filename":"laz/ARRA-CA_GoldenGate_2010_001019.laz.txt"
}
]
}

The above code uses a las reader to read in the file ARRA-CA_GoldenGate_2010_001019.laz in laz folder, and the text writer to output a corresponding .csv file. From the metadata provided with this dataset, I determined that the data is provided in spatial reference EPSG:26910, which is then specified to the reader.

To run the above pipeline code, save the above text as pipeline_single.json in the directory that contains the laz folder. Then we will run the pipeline from within the pdal container. Bash into the pdal container using
docker start pdal && docker exec -it pdal bash . Next, go to the correct directory within the container that contains the pipeline_single.json file ( cd /scripts), and finally run the following command, which will create the desired .csv file in the laz folder.

pdal pipeline -i pipeline_single.json

The pointcloud data in text format in the .csv file can be imported into QGIS and is visualized below.

(left) LiDAR pointcloud colored by point elevations. (right) satellite view for the same area.

While text files make it easier to visualize LiDAR data, they are not a great medium to store and process the large amounts of data for an entire city. We will do that in PostgreSQL. PostgreSQL provides database capabilities, postgis extension make it easy to work with geomteric and geographic data, and pointcloud extension provide a convenient way to store and extract pointcloud data. PDAL has a postgres writer that works with pointcloud data, so we can use it to read in the .laz files and store them directly into our database.

If you started the docker container for pgpointcloud as described above, then the PostgreSQL database should already be running with the relevant extensions. You can use the following PDAL pipeline code to read all the LiDAR data files for the dataset and store them in your database. Save the following PDAL pipeline code in a file called pipeline_batch.json.

{
"pipeline":[
{
"type":"readers.las",
"spatialreference":"EPSG:26910"
},
{
"type":"filters.chipper",
"capacity":600
},
{
"type":"writers.pgpointcloud",
"connection":"host='pgpointcloud' dbname='pointclouds' user='postgres' password='mysecretpassword' port='5432'",
"compression":"dimensional",
"srid":"26910"
}
]
}

Note that in the above code, the filename for LiDAR data is not specified in the pipeline since we want the pipeline to be able to read in different files from the directory. The filename will be passed to the pipeline through a bash script when the pipeline is being called. Additionally, there is a chipper filter with capacity 600. This combines the pointcloud data into patches of 600 nearby points and the database stores these patches instead of individual points. I further discuss the concept of patches below. 400 to 600 points are recommended in a single patch to keep each patch under the PostgreSQL page size limit of 8kb. Finally, the writer in this pipeline connects to a PostgreSQL database, instead of writing to a .csv file, as in the previous pipeline.

The following bash script finds all the .laz files within the specified directory, and passes each file as an argument to the PDAL pipeline, which then reads and stores the file in the PostgreSQL table called lidar. In my implementation, I first selected only the subset of files associated with collection grids for San Francisco and stored them in the sub-directory san francisco to reduce the number of files to read and store.

#!/bin/bashdirname="/scripts/LIDAR/ARRA-CA_GoldenGate_2010"
subdirname="san francisco"
tablename="lidar"
for filename in "$dirname"/"$subdirname"/*.laz; do
echo "Processing $filename ..."
pdal pipeline -i "$dirname"/pipeline_batch.json \
--readers.las.filename="$filename" \
--writers.pgpointcloud.table="$tablename"
done

Save the above file as batchLAZtoPostgres.sh, make it executable, and run it from within the pdal container, using the commands below. Note that /scripts/LIDAR/ARRA-CA_GoldenGate_2010 is the directory structure for the LiDAR files within the pdal container.

# make script executable
chmod +x batchLAZtoPostgres.sh
# run the script
./batchLAZtoPostgres.sh

Since both pdal and pgpointcloud containers are on the same bridge network, PDAL would be able to connect to the database and store the pointcloud data as patches. Depending on the number and size of files, reading and storing all the points in the database could take a few hours to a couple of days.

Working with LiDAR data in PostgreSQL

Patches are an efficient way to store pointcloud data in PostgreSQL tables, where each patch consists of a number of nearby points. This helps to greatly reduce the number of rows for the data, and makes operations on the table faster. The schema for patches and points is described in the table pointcloud_formats in the database. In addition, pointcloud extension offers functionality to work with both points and patches, including extracting individual attributes like elevations, intensity or coordinates for each point or bounding box extents of patches. If you would like to read more, this presentation by Paul Ramsey at OpenGEO has really good explanations with examples for points and patches.

All the subsequent steps are implemented within the pgpointcloud container, that you can bash into using the following commands -

docker start pgpointcloud
docker exec -it pgpointcloud bash

We will be joining the LiDAR data with OSM building footprints to extract building elevations and heights. So, let’s first load in the OSM shapefile data described above, into a table called osm using the following command from within the pgpointcloud container-

shp2pgsql -s 4326 gis_osm_buildings_a_free_1.shp public.osm | psql -U postgres -d pointclouds

We will use PostgreSQL queries for most of the remaining steps. You can access the database using the following command and run all queries from within the database.

psql -U postgres -d pointclouds

Now, we are ready to extract building elevations and heights and the detailed process is described below. The overview process to get building elevations is to first identify all the LiDAR points within the building footprints from OSM, and then get the building elevation from this collection of points. The overview to get building heights is to first estimate the ground elevation along the outline of each building footprint, and subtract it from the building elevation.

  1. Pre-process lidar table to create a spatial index.
    a. View the attributes of each point in a patch. The formats of patches and points are described in XML format in a separate table called pointcloud_formats.
    SELECT * FROM pointcloud_formats;
    b. View the list of columns in the table. You will notice that the patches are stored in the column pa.
    \d lidar
    c. View the number of patches and total points in the table —
    SELECT Count(*), Sum(PC_NumPoints(pa)) FROM lidar;
    d. Create a spatial index on the lidar table for faster queries (observed speedup of 40 times) —
    CREATE INDEX lidar_envelope_pkey ON lidar USING GIST(PC_EnvelopeGeometry(pa));
  2. Pre-process osm table to create a spatial index.
    a. Points outside a bounding box of the city may be removed using the query below. Replace (lng_min, lat_min, lng_max, lat_max) with the extents of the bounding box of interest, , where lng refers to longitude and lat refers to latitude.
    DELETE FROM osm WHERE osm.gid IN (SELECT a.gid FROM osm a WHERE NOT ST_Intersects(a.geom, ST_MakeEnvelope (lng_min, lat_min, lng_max, lat_max, 4326)) );
    b. I recommend converting the osm geomtery column geomto the same SRID (spatial reference system) of the lidar table. UTM zones are great for dealing with a small area and measurements, so we convert geom to EPSG:26910, which is the same refernce for the LiDAR data.
    ALTER TABLE osm ALTER COLUMN geom TYPE geometry(MultiPolygon, 26910) USING ST_Transform(geom, 26910);
    CREATE INDEX osm_26910_pkey ON osm USING GIST (geom);
  3. Create a new column in osm to contain patches of points that are within each of the building footprints.
    a. In order to add a new column to osm to store patches, first check the type of patches in lidar using \d lidar. Typically it would be pcpatch(1). Add a new column to osm to store patches
    ALTER TABLE osm ADD COLUMN pa pcpatch(1);
    b. Store the pointcloud patch that overlays each building footprint polygon. The query below first identifies all patches that intersect each building footprint, then calculates their union, and finally finds the intersection of the union patch with the footprint. Expect this query to take a few hours.
    UPDATE osm SET pa = sq.pa FROM (WITH patches AS (SELECT o.gid AS gid, o.geom AS geom, l.pa AS pa FROM lidar AS l JOIN osm AS o ON PC_INTERSECTS(l.pa, o.geom)) SELECT gid, PC_INTERSECTION(PC_UNION(pa), geom) AS pa FROM patches GROUP BY gid, geom) AS sq WHERE osm.gid = sq.gid;
  4. We will describe the building elevation as a statistic for the elevations of all points within the building footprint. The statistic may be chosen based on your use case. Here, I calculate the maximum, average, and median elevations from all the points. Note than Z-value in the LiDAR data is the elevation given a certain datum specified in its metadata (NAD83 in this case). Instead of taking the maxima directly, it is estimated as the 99.9th percentile value to reduce the possibility of an outlier resulting in overestimation of highest elevation.
    a. Add relevant columns to osm
    ALTER TABLE osm ADD COLUMN z_avg DOUBLE PRECISION NULL, ADD COLUMN z_median DOUBLE PRECISION NULL, ADD COLUMN z_max DOUBLE PRECISION NULL;
    b. Calculate and store elevation statistics
    UPDATE osm SET z_avg = sq.z_avg, z_median = sq.z_median, z_max = sq.z_max FROM (WITH patches AS (SELECT o.gid AS gid, PC_GET(PC_EXPLODE(o.pa), ‘Z’) AS pt_z FROM osm AS o) SELECT gid, AVG(pt_z) AS z_avg, PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY pt_z) AS z_median, PERCENTILE_CONT(0.999) WITHIN GROUP(ORDER BY pt_z) AS z_max FROM patches GROUP BY gid) AS sq WHERE osm.gid = sq.gid;
  5. Obtain ground elevation by identifying building outline, and joining it with the LiDAR data.
    a. Create a new table called osm_outline as a copy of osm table to process the building outline generated by taking the difference between a 2m and 1m buffer around the building polygons.
    CREATE TABLE osm_outline AS SELECT gid, osm_id, geom FROM osm;
    UPDATE osm_outline SET geom = buffer FROM (SELECT o.gid, ST_MULTI(ST_DIFFERENCE(ST_MULTI(ST_Buffer(o.geom, 2)), ST_MULTI(ST_BUFFER(o.geom, 1)))) AS buffer FROM osm_outline AS o) AS sq WHERE osm_outline.gid = sq.gid;
    CREATE INDEX osm_outline_pkey ON osm_outline USING GIST (geom);

    b. Store the pointcloud patches that intersect with the outline. The query below first identifies all patches that intersect each outline, then calculates their union, and finally finds the intersection of the union patch with the outline. Expect this query to take a few hours.
    ALTER TABLE osm_outline ADD COLUMN pa pcpatch(1);
    UPDATE osm_outline SET pa = sq.pa FROM (WITH patches AS (SELECT o.gid AS gid, o.geom AS geom, l.pa AS pa FROM lidar AS l JOIN osm_outline AS o ON PC_INTERSECTS(l.pa, o.geom)) SELECT gid, PC_INTERSECTION(PC_UNION(pa), geom) AS pa FROM patches GROUP BY gid, geom) AS sq WHERE osm_outline.gid = sq.gid;

    c. Store the minimum Z-value from the pointcloud that intersects the building outlines. Instead of taking the minima directly, it is estimated as the 1st percentile value to reduce the possibility of an outlier resulting in underestimation of ground elevation.
    ALTER TABLE osm_outline ADD COLUMN z_ground DOUBLE PRECISION NULL;
    UPDATE osm_outline SET z_ground = sq.z_min FROM (WITH patches AS (SELECT o.gid AS gid, PC_GET(PC_EXPLODE(o.pa), ‘Z’) AS pt_z FROM osm_outline AS o) SELECT gid, PERCENTILE_CONT(0.01) WITHIN GROUP(ORDER BY pt_z) AS z_min FROM patches GROUP BY gid) AS sq WHERE osm_outline.gid = sq.gid;

    d. Add ground elevation to the original osm table and calculate heights as the difference between footprint elevations and ground elevations.
    ALTER TABLE osm ADD COLUMN z_ground DOUBLE PRECISION NULL, ADD COLUMN height_avg DOUBLE PRECISION NULL, ADD COLUMN height_median DOUBLE PRECISION NULL, ADD COLUMN height_max DOUBLE PRECISION NULL;
    UPDATE osm AS o SET z_ground = oo.z_ground FROM osm_outline oo WHERE o.gid = oo.gid;
    UPDATE osm SET height_avg = (z_avg — z_ground), height_median = (z_median — z_ground), height_max = (z_max — z_ground);
(left) OSM footprints overlain on satellite image. (right) Footprint outlines generated in PostgreSQL

That’s it! You have successfully calculated the building elevations, ground elevations, and building heights for each building and all this data is stored in the osm table along with the building footprint geometry for each building.

If you would like to visualize the elevations and heights, you can export your data to a Shapefile. First exit the database using \q, and then run the following command to export the osm table to the file named sanfrancisco_buildings.shp

pgsql2shp -f sanfrancisco_buildings.shp -g geom -u postgres -d pointclouds “SELECT gid, osm_id, code, fclass, name, type, geom, z_avg, z_median, z_max, z_ground, height_avg, height_median, height_max FROM osm WHERE height_median IS NOT NULL”

I imported the Shaepefile into QGIS, and used the Qgis2threejs plugin to create an interactive output as shown at the top of this article. The plugin allows showing both ground elevations and buildings in 3D, and the visualization can be accessed easily from any browser.

Interested in learning more about our technology? Head over to our website or our Medium page to read about some of our recent projects!

--

--

One Concern
One Concern

We’re advancing science and technology to build global resilience, and make disasters less disastrous