Efficient spatial matching in BigQuery

Use a spatial join — a SQL join with a ST_* condition

Lak Lakshmanan
Google Cloud - Community
2 min readJul 23, 2019

--

This was the question (suitably anonymized):

“I have a huge dataset of points and another huge dataset of polygons. I have to loop through the polygons one-by-one and find all the points inside each one. Can BigQuery do this?”

The answer? Yes, BigQuery can do it, but you don’t have to loop through the polygons. If your experience consists primarily of functional/object-oriented programming, SQL can seem very weird. This is the sort of thing that is quite efficient to perform with a spatial join.

Joins in BigQuery are an efficient way to associate two entities based on their spatial extent. Photo by kgrkz on Unsplash

The spatial join works for any two geometries and a wide variety of spatial functions. For example, let’s find all the zipcodes that cover an urban-area:

WITH city AS (
SELECT
name,
urban_area_geom AS geom
FROM `bigquery-public-data`.geo_us_boundaries.urban_areas
),
zip AS (
SELECT
zip_code,
zip_code_geom AS geom
FROM `bigquery-public-data`.geo_us_boundaries.zip_codes
)
SELECT
city.name, TO_JSON_STRING(ARRAY_AGG(zip_code)) AS zipcodes
FROM
city
JOIN
zip
ON
ST_INTERSECTS(city.geom, zip.geom)

GROUP BY city.name

The key bit is that we do a join of the city table to the zipcode table on the condition that the city geometry intersects the zipcode geometry.

There are other functions you can use, of course. You can check that the distance of a point from a polygon is < 100 km. Or that one geography fully contains the other geography.

Try it out in BigQuery!

Many of my blog posts come from customer questions even if I don’t always mention the conversation that prompted the post. So, if you’ve got a question on doing data analytics or ML on GCP, tweet it to me at @lak_gcp.

--

--

Lak Lakshmanan
Google Cloud - Community

articles are personal observations and not investment advice.