Enable business data with BigQuery for geospatial analysis and map visualization

Dong Sun
Google Cloud - Community
5 min readAug 24, 2020

There are several dimensions of information when it comes to business decision making. Spatial data offers a powerful extra dimension of information for decision making. To visualize the spatial data on a map, we use a dataset containing the spatial data, usually a geographic column (point, line/polyline or polygon information) in a standard geospatial format like GeoJSON, WKT, etc. Many business attribute data doesn’t have a geography column. However, it’s easy to join business attributes with other dataset to relate geo spatial info to gain the capability to drill down data with spatial analysis. For example, state, county, zip, road network, point of interest, etc.

Google BigQuery offers a set of data that contains geospatial information. In this article, we use Zip Code boundary in US census data as spatial data, and Iowa state liquor sales data as an example of business attribute data, to show how to join attribute data with spatial information and visualize it on a map.

BigQuery dataset used:

  1. Public dataset — Census Bureau US Boundaries

This dataset includes polygon boundaries of geographic and statistical areas, some linear features including roads and hydrography, and point features too. You can find more information about this dataset here.

In this dataset, the geospatial column is called zip_code_geom. It is a list of comma separated latitude and longitude pairs defining the polygon boundaries.

Sample value look like this:

POLYGON((-93.648086 41.619944, -93.64766 41.6186, -93.647425 41.61818, -93.647408 41.618149, ….)

2. Public dataset — Iowa State Liquor Retail Sales

This dataset offers a complete view of retail liquor sales in the entire State of Iowa. The dataset contains orders of liquor for grocery stores, liquor stores, convenience stores, etc., with details about the store and location, the exact liquor brand and size, and the number of bottles ordered.

The goal of the query is to identify sales greater than $100K per zip code for the year of 2019

Query used:

SELECT 
boundary.zip_code AS zip_code,
total_sales,
zip_code_geom
FROM
(SELECT zip_code, county, zip_code_geom
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE STATE_CODE = 'IA' ) AS boundary
LEFT JOIN
(SELECT zip_code, sum(sale_dollars) AS total_sales
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE EXTRACT(YEAR FROM date) = 2019
GROUP BY zip_code ) AS sales
ON sales.zip_code = boundary.zip_code
WHERE total_sales > 100000

You can run the above query directly, or you can save the query to a view, and then query the view. It’s easy to visualize results in BigQuery Geo Viz. The following image shows total sales in dollar amount ranges based map. The darker the color, the more sales it had.

Geo Viz is good for quick geo spatial data display, but it’s not easy to integrate the map in Geo Viz with your existing applications. Google Maps API is more flexible when presenting spatial data. You can submit the query to the BigQuery inside the javascript code, and add results to the map on the fly. Since Google Maps API also provides functions like Geocoding (translate address to latitude and longitude), reverse geocoding (translate latitude and longitude to address), route calculating, you can quickly display your data in your application with geospatial functions.

To submit a query to BigQuery, we can use BigQuery Javascript library. For a step by step guide to set it up including 2 step OAuth authorization, please check out this tutorial.

Code snippet for setting up authentication and authorization.

// Check if the user is authorized.
function authorize(event) {
gapi.auth.authorize({client_id: clientId, scope: scopes, immediate: false}, handleAuthResult);
return false;
}
// If authorized, load BigQuery API
function handleAuthResult(authResult) {
if (authResult && !authResult.error) {
loadApi();
return;
}
console.error('Not authorized.')
}

To draw geographic column on the map, we convert data in the geography column to GeoJSON using ST_ASGEOJSON() function in the query.

SELECT 
boundary.zip_code AS zip_code,
total_sales,
zip_code_geom
FROM (
SELECT zip_code, county,
ST_ASGEOJSON(zip_code_geom) AS zip_code_geom
FROM `bigquery-public-data.geo_us_boundaries.zip_codes`
WHERE STATE_CODE = 'IA' ) AS boundary
LEFT JOIN
(SELECT zip_code, sum(sale_dollars) AS total_sales
FROM `bigquery-public-data.iowa_liquor_sales.sales`
WHERE EXTRACT(YEAR FROM date) = 2019
GROUP BY zip_code ) AS sales
ON sales.zip_code = boundary.zip_code
WHERE total_sales > 100000

We then add the result to Google Map using Datalayer which is used to display GeoJSON data on a Google map.

function doMap(rows){
geojson = {
"type": "FeatureCollection",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:OGC:1.3:CRS84"
}
},
"features":[]
};
for (let i = 0; i < rows.length; i++) {
geojson.features.push(
{ "type": "Feature",
"properties": {"total_sales": rows[i].f[1].v,
"zip_code": rows[i].f[0].v},
"geometry": JSON.parse(rows[i].f[2].v)
})};
map.data.addGeoJson(geojson);map.data.setStyle(function(feature) {
var total_sales =
parseFloat(feature.getProperty("total_sales"));
var zip_code = feature.getProperty("zip_code");
var color;
if (total_sales > 15000000)
color = '#033b23'
else if (total_sales > 5000000 && total_sales <= 15000000)
color = '#006d2c'
else if (total_sales > 1000000 && total_sales <= 5000000)
color = '#zca25f'
else if (total_sales > 500000 && total_sales <= 1000000)
color = '#66c2a4'
else if (total_sales >= 100000 && total_sales <= 500000)
color = '#99d8c9'
else
color = 'white'
return {
fillColor: color,
strokeWeight: 0
};
}) ;
}

The resulting map looks like the following. You can add more controls to the map feature, like click to get zip code info or census data, etc.

Next step

The above example uses boundary polygons (zip code). The same theory can be applied for point geospatial data, for example, point of interests.

Now you can enable your business attribute data with another dimension of intelligence — the geo location dimension. The data pipeline could look like the following.

If you have stream data, or your application has low latency requirements, you can consider moving the spatial data processing to Dataflow.

--

--