Loading large spatial features to BigQuery geography
If you have GIS data in common GIS formats, like shapefiles or GeoJSON files and want to load it to BigQuery, Lak Lakshmanan has a very good walkthrough:
Now that BigQuery has support for GIS queries, let’s look at how we could load in geographic data into BigQuery. I’ll…medium.com
But there is a problem with this path, when your data includes large polygons or lines. Here by “large” I mean spatially large, with long edges.
The issue here is the difference between planar edges and geodesic edges. GeoJSON files and shapefiles contain planar data with edges that are straight on the flat map. What happens with this data when we ingest it to BigQuery?
- org2ogr converts them to WKT format, it changes the syntax, but does not change the data points,
- BigQuery treats WKT as spherical geography, assuming geodesic edges.
Thus due to ogr2ogr and BigQuery using different interpretation of WKT format, planar edges are reinterpreted as geodesic ones. The difference does not matter for small features, but could be quite significant for large features.
The correct way to ingest planar geometries to BigQuery is by using GeoJSON geometries. GeoJSON uses planar edges, and when ingesting it, BigQuery approximates planar edges with geodesic ones, adding intermediate vertices as needed, providing ~10 meters approximation precision.
How do we make org2ogr produce CSV with GeoJSON geometries? It has GEOMETRY=AS_WKT option, but does not have GEOMETRY=AS_GEOJSON option. The solution is to use sqlite query within ogr2ogr:
ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from test" test.csv test.shp
This produces CSV file with GeoJSON-encoded geometry column. Only this change in org2ogr command is needed. When importing to BigQuery, BigQuery can automatically distinguish between WKT and GeoJSON formats, but when reading GeoJSON it knows the edges are planar and approximates them by geodesic edges.