Importing GIS Data into BigQuery

Have you been wondering how to import data that is geolocated into BigQuery? Well, wonder no more.

Miguel Dias
Google Cloud - Community
5 min readAug 5, 2022

--

Photo by NASA on Unsplash

Introduction

Isn’t Big Data great? It sure is... It opened so many doors in the analytical world, managing to find patterns where once we thought no patterns existed. So much has been done on this topic, but unfortunately, mostly was on tabular data. What about geographical data? Yes, we have many ways of processing Geodata, but are any of those really suited for BigData Processing? The answer is a sounding NO.

Fortunately, we can now reap the benefits of Cloud-based BigData processing by using BigQuery. BigQuery has integrated several GIS functions that allow us to manipulate significantly large geographic tables in a serverless way, optimizing the query on the background and getting rid of the huge computational overhead created by GeoFunctions.

This article will focus on the ingestion of GeoData into the BigQuery environment.

1. Importing with CSV

Although it is possible to import easily a csv with a WKT encoded geometry into BigQuery and then transform it directly into a Geography data type, why should we take those extra steps?

The first thing we need to make sure is that we have a table with a geometry column as WKT (Well-Known-Text). WKT is the generally accepted way of encoding a geometry:

Image provided by: www.codetd.com

Usually, most geospatial packages in Python, such as, geopandas or shapely have functions to transform geometries directly into WKT format. Be advised, at the moment of the writing of this article, only WGS84 coordinates are allowed in BigQuery, so, if need be, you might have to reproject your data into WGS84, but that is out of the scope of this article.

The trick here is to load the data with a pre-defined schema. Using the autodetect function will not work as it will mistakenly assume the geometry is a string instead of a geography.

Let’s take my sample file:

Image by Author

Then we go to our trusted Google Cloud SDK and run the command in your terminal:

  • bq load: The command to load data into BigQuery
  • — source_format=CSV: The format of the table
  • — skip_leading_rows=1: Skipping header (first line)
  • — schema: The schema and datatypes of the table, with the WKT column as GEOGRAPHY
  • MDIAS.SAMPLE_GIS: Your {database}.{table_name} in BigQuery
  • ~/Documents/sample_gis.csv: Where my original file is stored.

In my opinion this is the easiest way to import geodata into BigQuery and having it ready with Geography datatype from the get go. If you have tables that should be refreshed regularly and the schema doesn’t change, then this is the perfect way.

And there you have it. Let’s look at is in geoviz:

Google GeoViz.: Image by Author

2. Importing with GeoJSON lines

Another way of importing the data is with GeoJSON Sequential (GeoJsonSeq), also known as Newline Delimiter GeoJSON or even GeoJSON lines (geojsonl).

GeoJSON is a very common format for exchanging geographic data. However, very large data sets are common in GIS, and the structure of a GeoJSON file generally requires the entire file to be read into memory and decoded all at once. Furthermore, GeoJSONL can only be encoded with WGS84 projection.

You could transform your own shapefile or geodatabase into a geojsonl file with the help of a geographic manipulation tool like GDAL. Once you have installed GDAL, you can run:

Please note that {geojson, shp, etc} means that you can have several different formats, and GDAL will recognize and convert it appropriately. From here its only sunshine and rainbows. You can do a simple bq load job as we seen previously, for example:

  • bq load: The command to load data into BigQuery
  • — source_format=NEWLINE_DELIMITED_JSON: The format of the table
  • — json_extention=GEOJSON: Since we are using GEOJSON
  • — autodetect: In this case we can just use autodetect as schema
  • MDIAS.SAMPLE_GIS: Your {database}.{table_name} in BigQuery
  • ~/Documents/sample_gis.geojsonl: Where my original file is stored.

This way works fine but there is a lot of pre-processing that has to be done. Plus, in my experience, geojson lines, even though it is an optimized geojson, it still makes a far larger file than just a csv with WKT encoded, which will increase the load time. Nonetheless, this is just as good of an import as anything.

3. Data is already loaded but it is not converted

Let’s say you already have data in BigQuery that has either latitude and longitude, or even geometry as a WKT string or GeoJSON string. No worries. Google made it quite simple for you to quickly transform it into a geography data type.

Latitude and Longitude

If you have latitude and longitude in your table, all you have to do is use ST_GEOGPOINT(longitude, latitude) and that will create your geometry. For example:

And here it is vizualized:

Google GeoViz: Image by Author

Other String

If you have the geometry defined under other external formats, all you have to do is to use the correct ST function:

  • ST_GEOGFROM: Will try to figure out what which format it is and then parse it.
  • ST_GEOGFROMTEXT: Will parse data from the normal WKT String. Example: ‘POINT(52.5244, -110.3823)’
  • ST_GEOGFROMGEOJSON: Will parse geometry from a geojson geometry string. Example: “{ “type”: “Point”, “coordinates”: [6.78204, 51.24976] }

There are other functions you can use for example for geohashes or well known binaries. You can always visit the documentation for the BigQuery GeoFunctions: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions

Conclusion:

  • BigQuery is one of the leading tools to do Geoprocessing at scale.
  • It might be necessary to do a slight preprocessing in order to import your geotable.
  • At the moment it’s possible to import geodata with csv and geojson. Ingestion with Shapefile is not yet supported (but its on Google’s roadmap).
  • If you already have data in bigquery, you can convert it into geodata, as long as you have the necessary attributes (latitude, longitude, wkt, etc).

--

--

Miguel Dias
Google Cloud - Community

Senior Data Scientist. Expert in Geoanalysis and Geoprocessing at scale. LinkedIn: https://www.linkedin.com/in/m-dias/