How to load GeoJSON files into BigQuery GIS

An ETL pipeline for GeoJSON to BigQuery

Lak Lakshmanan
Jul 17, 2019 · 2 min read

While this tutorial is specifically for GeoJSON files for European administrative areas, I hope you will be able to use it as a guide for other sources of data. Do this in CloudShell or any machine where you have the Google Cloud SDK and Python3 installed.

To query polygon boundaries, load European administrative areas’ GeoJSON into BigQuery

First download the file you are interested in from the Eurostat website:

wget https://ec.europa.eu/eurostat/cache/GISCO/distribution/v2/nuts/download/ref-nuts-2016-01m.geojson.zip

Then, unzip it to get the GeoJSON file of interest.

mkdir tmp
cd tmp
unzip ../ref-nuts-2016-01m.geojson.zip

The GeoJSON file has a list of geometry features and some properties for each geometry. BigQuery requires new-line delimited JSON files where the geometry column is single string.

Here’s a Python program that will do the necessary conversion (note that the filename corresponds to the EPSG::4326 version — these are the ones with coordinates in longitude/latitude):

Here’s what a single line of the JSON file now looks like:

{"geometry": "{\"type\": \"LineString\", \"coordinates\": [[25.4907, 35.29833], [25.49187, 35.28897], [25.50206, 35.28633], [25.50528, 35.273], [25.49648, 35.2621], [25.49437, 35.24823], [25.49879, 35.24146], [25.51013, 35.23245], [25.51199, 35.22759], [25.49429, 35.22849], [25.47941, 35.22515], [25.47122, 35.21485], [25.46806, 35.21088], [25.4512, 35.20152], [25.43949, 35.20199], [25.42395, 35.21019], [25.41463, 35.20585], [25.41267, 35.20494], [25.41331, 35.19424], [25.41463, 35.19189], [25.41792, 35.18603], [25.41463, 35.18273], [25.39944, 35.16751], [25.40761, 35.15651], [25.41199, 35.15531], [25.40992, 35.13521], [25.41463, 35.13308], [25.42063, 35.13035], [25.42306, 35.12754], [25.42105, 35.11549], [25.4345, 35.11028], [25.44723, 35.11027], [25.46348, 35.09302], [25.47211, 35.08832], [25.50064, 35.08005], [25.5182, 35.0656], [25.50834, 35.05567], [25.49872, 35.05048], [25.49786, 35.0427], [25.50702, 35.03823], [25.52577, 35.03663], [25.52902, 35.02909], [25.53016, 35.02644], [25.54641, 35.01885], [25.54925, 35.00995], [25.54713, 34.99973], [25.55057, 34.99076]]}", "EU_FLAG": "T", "CC_FLAG": "F", "OTHR_CNTR_FLAG": "F", "LEVL_CODE": 3, "FID": 102, "EFTA_FLAG": "F", "COAS_FLAG": "F", "NUTS_BN_ID": 102}

The resulting schema printed by the program is:

geometry:GEOGRAPHY,EU_FLAG,CC_FLAG,OTHR_CNTR_FLAG,LEVL_CODE:int64,FID:int64,EFTA_FLAG,COAS_FLAG,NUTS_BN_ID:int64

Now, load the new-line separated JSON into BigQuery (change the dataset and table name appropriately):

bq load --source_format NEWLINE_DELIMITED_JSON advdata.eurostat to_load.json geometry:GEOGRAPHY,EU_FLAG,CC_FLAG,OTHR_CNTR_FLAG,LEVL_CODE:int64,FID:int64,EFTA_FLAG,COAS_FLAG,NUTS_BN_ID:int64

That’s it! Try out a query to order the polygons in terms of their distance from Frankfurt, Germany:

SELECT 
* EXCEPT(geometry)
, ST_Centroid(geometry) AS center
, ST_Distance(geometry, ST_GeogPoint(8.68, 50.11))/1000 AS dist
FROM advdata.eurostat
ORDER by dist ASC
LIMIT 5

This returns:

Enjoy!

Google Cloud - Community

Google Cloud community articles and blogs

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.

Lak Lakshmanan

Written by

Data Analytics & AI @ Google Cloud

Google Cloud - Community

A collection of technical articles and blogs published or curated by Google Cloud Developer Advocates. The views expressed are those of the authors and don't necessarily reflect those of Google.