How to query geographic raster data in BigQuery efficiently

Run-length encode the raster image when loading it into BigQuery

Represent the data as rectangles

Run-length encoding in GIS translates to storing the data as rectangles instead of pixels
POLYGON((-11.3000 -72.3000, -11.3000 -72.4000, -11.2000 -72.4000, -11.2000 -72.3000, -11.3000 -72.3000))
POLYGON((-11.3000 -72.3000, -11.3000 -72.4000, 169.9000 -72.4000, 169.9000 -72.3000, -11.3000 -72.3000))
The weird shape that results if we use WKT’s spherical representation for large polygons
SELECT ST_GeogFromGeoJson("{ \"type\":\"Polygon\", \"coordinates\": [ [ [-11.3000, -72.3000], [-11.3000, -72.4000], [169.9000, -72.4000], [169.9000, -72.3000], [-11.3000, -72.3000]]] }") AS geom
The GeoJson representation gives us what we intended: huge swaths of Antartica have the same population density, and instead of representing as pixels, we can represent it as a single rectangle.

Loading the table with rectangles

bq load --replace \
--source_format NEWLINE_DELIMITED_JSON \
--range_partitioning=year,1900,2100,5 \
--clustering_fields tile \
advdata.${TABLE} $GCSFILE schema.json
 {
"description": "polygon representing boundary of rectangle",
"mode": "REQUIRED",
"name": "bounds",
"type": "GEOGRAPHY"
},
    Pixels:      457 MB, 2139k rows
Rectangles: 96 MB, 319k rows

Querying the table

WITH urban_populations AS (
SELECT
lsad_name
, SUM(ST_AREA(ST_INTERSECTION(bounds, urban_area_geom))/1000000) AS area_sqkm
, COUNT(1) AS num_rectangles
, AVG(population_density) AS pop_density
FROM advdata.popdensity_nasa, `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE ST_INTERSECTS(bounds, urban_area_geom)
AND STRPOS(lsad_name, ', WA') > 0
GROUP BY lsad_name
)
SELECT
*, (area_sqkm * pop_density / 1000000) AS population_millions
FROM urban_populations
ORDER BY area_sqkm DESC
LIMIT 10
ST_AREA(bounds)
ST_AREA(ST_INTERSECTION(bounds, urban_area_geom))

ELT from pixels to rectangles

CREATE OR REPLACE TABLE advdata.sedac_rectangles AS
PARTITION BY RANGE_BUCKET(year, GENERATE_ARRAY(2000, 2100, 5))
CLUSTER BY tile
OPTIONS(require_partition_filter=True)
with dissolved AS (
SELECT
year, tile, rowno, population_density,
ST_UNION( ARRAY_AGG(bounds) ) AS bounds,
FROM advdata.sedac_pixels
GROUP BY year, tile, rowno, population_density
)
SELECT *, ST_CENTROID(bounds) AS location
FROM dissolved
Pixels:         57.03 GB       223m rows
Rectangles: 17.79 GB 28m rows
When doing a ST_Union, you will get MultiPolygon and this is not as efficient for spatial querying as the Polygons that result if we were to do the run-length encoding in the ETL program
WITH urban_populations AS (
SELECT
lsad_name
, SUM(ST_AREA(ST_INTERSECTION(bounds, urban_area_geom))/1000000) AS area_sqkm
, COUNT(1) AS num_rectangles
, AVG(population_density) AS pop_density
FROM advdata.popdensity_sedac_rectangles, `bigquery-public-data.geo_us_boundaries.urban_areas`
WHERE year = 2020
AND tile = 'gpw_v4_population_density_rev11_2020_30_sec_1.asc'
AND ST_INTERSECTS(bounds, urban_area_geom)
AND STRPOS(lsad_name, ', WA') > 0
GROUP BY lsad_name
)
SELECT
*, (area_sqkm * pop_density / 1000000) AS population_millions
FROM urban_populations
ORDER BY area_sqkm DESC
LIMIT 10

Next steps:

  1. Read the previous article on raster data in BigQuery for more context
  2. Browse the code on GitHub
  3. Try out the code by following the steps in the README.md file

Key points:

  1. When loading raster data into BigQuery, represent the image as a set of rectangles. This is akin to run-length encoding the data and will cut your storage costs and speed up your queries.
  2. Prefer the GeoJson representation over the WKT representation since the planar representation of GeoJson is much more intuitive
  3. You can quickly troubleshoot shapes using BigQuery GeoViz
  4. You can do ELT of raster data as pixels and then transform them into rectangles in BigQuery with ST_Union. However, it is not as effective as load the data as rectangles into BigQuery by doing run-length encoding in your ETL program.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Lak Lakshmanan

Operating Executive at a technology investment firm; articles are personal observations and not investment advice.