COG in the Machine: Towards Cloud-Native Geospatial Deep Learning

Rapidly mapping the extent of Rohingya refugee camps in Bangladesh with drones, cloud optimized geoTIFFs, & deep learning


You open Google Maps and enter “coffee” to find shops nearby. The app proceeds to download a map of your entire city at the highest detail. You wait minutes and 100s of MBs download to your phone before 4 or 5 location pins drop closest to you.

If the app did this every time you’re in a new area or search for something different, you would probably stop using it. This scenario is extreme, even absurd, yet we often do something similar with geospatial data in deep learning.

We download full-sized satellite or aerial imagery (at 100s of MBs to GBs per image or per band), crop, resize, & tile them to the areas, sizes, & formats we need, and run our model training or inference on the end product while holding a relatively large portion of the source data unused.

If we need the highest resolution & full coverage of all images to train our models, this works. But what if we want to evaluate models on small select subareas of new images or analyze very large areas at faster speed & less detail than what the original file provides?

Could we access just the relevant areas at only the resolutions we need? Until recently, we had little choice but to download the entirety of every image. Now, thanks to Cloud Optimized GeoTIFFs (COG), we have a better way to run deep learning models on geospatial data more efficiently at any size & scale.

A Brief Intro to COG

In the geospatial world, many exciting developments are moving us towards more interoperable, cloud-native architectures for data processing & analysis:

The whys, whats, and hows of these related initiatives have been well explained by Chris Holmes in his 3-part “Cloud-Native Geoprocessing” series so I won’t get into that here. I will touch on these as they pertain to geospatial deep learning in subsequent posts.

Today, though, is all about the COG. Summed up in an excellent presentation on efficient cloud workflows by @echeipesh from GeoTrellis, we want this:

“Hey, let’s not download the whole file every time.”

COGs deliver: accessing huge geospatial files becomes a speedy & selective data-streaming experience using the same web tech that enable videos to start playing before the whole file is downloaded. Here are some recent advances & implementations showing what’s newly possible:

COG Map viewer [via Chris Holmes]

COG for Deep Learning

The functionality we’ll focus on for deep learning is how COGs use tiling & overviews:

Tiling creates a number of internal ‘tiles’ inside the actual image, instead of using simple ‘stripes’ of data. With a stripe of data then the whole file needs to be read to get the key piece. With tiles much quicker access to a certain area is possible, so that just the portion of the file that needs to be read is accessed.
Overviews create downsampled versions of the same image. This means it’s ‘zoomed out’ from the original image — it has much less detail (1 pixel where the original might have 100 or 1000 pixels), but is also much smaller. Often a single GeoTIFF will have many overviews, to match different zoom levels. These add size to the overall file, but are able to be served much faster, since the renderer just has to return the values in the overview instead of figuring out how to represent 1000 different pixels as one.

The organization of these tiles & overviews delivered by a COG tile server (like thanks to Radiant.Earth & Seth Fitzsimmons) generally follows the slippy map tile naming convention:

  • Tiles are 256 × 256 px or 512 × 512 px PNG or JPG files
  • Filename(url) format is /{zoom}/{x}/{y}.png
  • Each zoom level is a directory {zoom}, each column is a subdirectory {x}, and each tile in that column is a file {y}

Hmm, 512 px & 256 px square images. Starting to sound familiar?

COG tiles & overviews hand us geospatial data on a platter: consistently georeferenced & internally organized, optimized for fast access & visibility at every zoom level, and formatted in a familiar way for deep learning models.

To test drive the potential, I created an Input COG → Model Inference → Output COG workflow that:

  1. gets overview tiles from any COG at any zoom level (or multiple levels)
  2. runs inference on each tile and reassembles the results
  3. saves the output as a properly georeferenced and validated COG file

An Example, Step-by-Step

Here’s that workflow in action. We’ll use this 7-cm resolution drone image taken by the UN’s International Org. for Migration of the Rohingya refugee camps near Cox’s Bazar, Bangladesh hosted as a COG on OpenAerialMap:

1. Get tiles at zoom levels 17, 18, & 19 served by

Single example tiles (from top-left of original image) at 3 zoom levels

2. Run inference per tile with model trained offline to find built-up areas* (binary semantic segmentation)

Model inference results on single example tiles at 3 zoom levels

* model used for demo of workflow; not fitted to this data so results may appear suboptimal. Model training & inference will be covered in a later post.

3. Reassemble tiled results into full output map at each zoom level

Tiles reassembled to full model outputs at each of 3 zoom levels

4. Ensemble into final output map (with new color range)

Original image next to final output map (average of 3 zoom levels) showing built-up areas in red

5. Calculate geo bounds & save as georeferenced COG

Output COG properly georeferenced & displayed on basemap [preview COG]

How Fast Is It?

Running the example workflow on a remote GPU instance (Paperspace’s P5000 machine) took ~30 seconds:

starting inference for zoom level: 17
100%|██████████| 15/15 [00:01<00:00,  9.36it/s]
starting inference for zoom level: 18
100%|██████████| 50/50 [00:10<00:00,  4.78it/s]
starting inference for zoom level: 19
100%|██████████| 152/152 [00:13<00:00, 11.04it/s]
CPU times: user 14.2 s, sys: 968 ms, total: 15.1 s
Wall time: 25.9 s

I was also impressed at the reasonable speed using CPU only: ~12 minutes on my run-of-mill 2015 Macbook Pro even though code is not optimized for performance:

CPU times: user 14min 51s, sys: 1min 26s, total: 16min 17s
Wall time: 11min 43s

It’s faster because using COG overview tiles at specific resolutions (zoom levels) directly gets the information we need while avoiding the unnecessary & heavy data management typical of preparing geospatial data for deep learning.

In the 1st example, the original COG file size is only 30MB which wouldn’t be a very perceptible difference. Using a much larger example, this 900MB source image of a bigger area was processed in 90 seconds on GPU. Working traditionally, I would still be downloading the file (at typical broadband speeds):

Mapping a larger area of Rohingya refugee camps in Bangladesh from drone imagery [preview COG]

A zoomed-in & overlaid view of the same output as above shows actionable details (where built-up areas are) are preserved up close:

Zoomed-in, overlaid model view of Rohingya refugee camp built-up areas (red) vs not (blue)

Oh, the Possibilities

COGs enable us to run our deep learning models more rapidly, lightly, & simply on geospatial data at any size or scale.

This data includes satellite & aerial imagery from Landsat on AWS, OpenAerialMap, & Planet and more soon as providers increasingly go to COG:

COG-based inference on Planet SkySat satellite image of Freeport, Texas after Hurricane Harvey [preview COG]

The advance of COG for deep learning means that we could:

  • work selectively with any-sized subareas and as many zoom levels of source imagery as we need to get useful results.
  • select single spectral bands or mix-and-match any band combination with one change in the tile server parameter (i.e. rgb=1,1,1).
  • create new models by serving COG tiles directly into our training pipelines with labels generated on the fly, perhaps via geospatial machine learning data prep tools such as Robosat or Label Maker.
  • test many models on imagery of one area, or one model on many areas and a wide gamut of visual conditions to evaluate their generalizability to real world data.
  • deploy models for any COG data provider with less cost & infrastructure (cloud-based, CPU only), making it more feasible to AI-enhance many localized humanitarian, environmental, & community-based geospatial projects like those being carried out by WeRoboticsFlying Labs.

In following posts, we’ll cover this workflow in technical detail (with code examples), experiment with these possibilities, and encourage more new ideas for cloud-native geospatial deep learning.

I look forward to seeing & sharing what you come up with!

Like what you’re reading? Want to protect our health & prepare our communities for climate change? If you’re looking to do your best work in geospatial analysis & deep learning to tackle our hardest systems challenges in environmental health & justice, Anthropocene Labs is looking for you! Get in touch with dave(at)anthropo dot co