Use a spatial join — a SQL join with a ST_* condition

Lak Lakshmanan
Jul 23 · 2 min read

This was the question (suitably anonymized):

“I have a huge dataset of points and another huge dataset of polygons and I want to find all the points within each of the polygons. In other words, I have to loop through the polygons one-by-one and find all the points inside each one. Can BigQuery do this?”

The answer? Yes, BigQuery can do it, but you don’t have to loop through the polygons. If your experience consists primarily of functional/object-oriented programming, SQL can seem very weird (I can empathize since two years ago, I had done nothing beyond a SELECT * in SQL). This is the sort of thing that is quite efficient to perform with a spatial join.

Joins in BigQuery are an efficient way to associate two entities based on their spatial extent. Photo by kgrkz on Unsplash

The spatial join works for any two geometries and a wide variety of spatial functions. For example, let’s find all the zipcodes that cover an urban-area:

WITH city AS (
SELECT
name,
urban_area_geom AS geom
FROM `bigquery-public-data`.geo_us_boundaries.urban_areas
),
zip AS (
SELECT
zip_code,
zcta_geom AS geom
FROM `bigquery-public-data`.geo_us_boundaries.zip_codes
)
SELECT
city.name, TO_JSON_STRING(ARRAY_AGG(zip_code)) AS zipcodes
FROM
city
JOIN
zip
ON
ST_INTERSECTS(city.geom, zip.geom)

GROUP BY city.name

The key bit is that we do a join of the city table to the zipcode table on the condition that the city geometry intersects the zipcode geometry.

There are other functions you can use, of course. You can check that the distance of a point from a polygon is < 100 km. Or that one geography fully contains the other geography.

Try it out in BigQuery!

Many of my blog posts come from customer questions even if I don’t always mention the conversation that prompted the post. So, if you’ve got a question on doing data analytics or ML on GCP, tweet it to me at @lak_gcp.

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lak Lakshmanan

Written by

Professional Services @ Google

Google Cloud Platform - Community

A collection of technical articles published or curated by Google Cloud Platform Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade