What if we could store and access all the geologic maps in a single, integrated database?
That is the question we began with when designing Burwell, a multi-scale, homogenized, spatially-indexed database for geologic maps. While geologic maps of various scales have existed for a long time with impressive coverage globally, they vary greatly in terms of thoroughness of attributes, nomenclature, formats, and accessibility, making it exceptionally difficult to use them for macro analyses. Geology drives, and is a proxy for, many of Earth’s processes, and having an accessible and homogenized database would make it possible to integrate this existing data with myriad other datasets across numerous domains.
We began by creating a very simplistic schema to hold this data, and declaring four scales that any data could fit in to — tiny (global), small (continental), medium (regional), and large (local) — and after loading a few example datasets, we ran into the obvious issue of how to easily visualize this much spatial data. With only a dozen sources, the database is already over 4GB gzipped, making it impractical to view all at once, even in a desktop GIS like QGIS. Ideally, we would be able to view and query it anywhere.
As humans have done for the last decade, the obvious solution was map tiles (of the raster variety…early prototypes with vector tiles did not go well). However, we had a few requirements:
- Use existing infrastructure (both hardware and software) in our lab. Our stack runs on Postgres and Node.js on consumer grade Mac Pros. This rules out using subscription services.
- Allow access to zoom levels 0 to 12. Any larger and we’d be misrepresenting the data.
- Fast access on any device.
- Fast, scriptable tile generation. As datasets are added, it is essential to be be able to quickly update the map.
- Keep dynamic tile generation to a minimum. Because of how detailed geologic maps can be, a prototype showed that generating a tile for the medium scale at zoom level 5 could take anywhere from 10–50 seconds.
By my back-of-the-envelope calculation, you would have 22,369,621 tiles for the world at zoom level 12 (source — wiki.openstreetmap.org). No matter your setup, doing something 22 million times is a lot of work. We can presumably do better than that.
71% of Earth’s surface is water, so it is safe to assume that a fairly significant number of those tiles will be empty, except if we do indeed have geologic data for that area. Unfortunately many tile generation tools require an input bounding box, which in our case will always be the entire world.
What if we could figure out exactly which tiles we need at every zoom level? Fortunately we can using tile-cover, a tool from Mapbox that takes an input geometry and zoom level and outputs a minimum list of tiles needed to cover that geometry.
To begin to find which tiles we need to generate, we start with the 1:50m land geometry from Natural Earth Data and run it through tile-cover for zoom levels 0–10. However, there are situations in which our geologic map data exists in water, like the Geological Map of the Arctic.
Large parts of the arctic dataset overlap land that we have already accounted for, so we are going to figure out what additional geometry we need to run through tile-cover.
We’ll run the following query in PostGIS:
, 4) geometry
SELECT ST_Simplify(((ST_Dump(geom)).geom), 1) AS geometry FROM (
ST_GeomFromText(‘POLYGON ((-179 -85, -179 85, 179 85, 179 -85, -179 -85))’, 4326),
WHERE sources.source_id = 2
) main WHERE geometry IS NOT NULL
What’s going on here? Let’s work from the inside out (we could have used WITH clauses instead and gone top down, but alas):
First we select the ST_Difference between the Arctic geometry (source_id = 2) and the Natural Earth land dataset. Instead of using the full geometry of the Arctic dataset, we pre-generated a field named ref_geom which is simply the ST_Envelope of the geometry, which, while substantially more coarse than the original geometry, makes the query tolerably fast. In this case, close enough is good enough.
Next, because of a limitation with tile-cover with pole-wrapping polygons, we clip the geometry produced by the difference of the Arctic dataset and the Natural Earth land by a bounding box that stretches from 85N to 85S. Our goal to create tiles that will be displayed in Web Mercator, so this doesn’t actually impact any data visible to the user.
After that, we greatly simplify the resultant geometry, make sure it is valid, and select it to a GeoJSON geometry that can be fed directly into tile-cover. The result looks like this:
Once we run this process for each dataset and process these polygons through tile-cover, we will have a list of all the tiles we need to generate.
To generate the tiles we use tilestrata, a “pluggable Node.js map tile server” inspired by TileStache. After prototyping with TileStache, we settled on tilestrata because it plugs into our existing API with a dozen lines of code.
The styles used to generate the tiles are built by a script that queries all layers to find unique colors that are computed by the age of each polygon, and generates a CartoCSS stylesheet that is then converted to a Mapnik XML file with kosmtik.
With our list of tiles that need to be generated and a stylesheet on hand, we generate tiles for zoom levels 0 through 10. Originally we had been generating tiles all the way down to zoom level 12 and serving them directly with nginx, but after some experiments we found that tiles at zoom levels 11 and 12 can be generated on the fly very quickly and then cached, so while 0 to 10 are always served directly off of disk, 11 and 12 are generated on the fly with tilestrata if need be. This also cuts the tile generation time down to around 15-20 minutes.
I acknowledge that this method is a bit convoluted, but given our constraints and needs it is very efficient. If you have any suggestions for improving this process, please let me know!
Now that WebGL is maturing, I am curious to experiment with using vector tiles, and fortunately because of our chosen stack and workflow, this will be as simple as altering a few lines of code.
You can see the tiles in action at https://macrostrat.org/burwell.