More and better satellite imagery through dynamic tiling

Keith Ma

Follow

Published in

IndigoAg.Digital

12 min readOct 19, 2020

--

Post by @keithfma and @ceholden

Here at Indigo, we on the remote-sensing engineering squad make geospatial data available to all parts of our business. These data fuel a wide range of applications and take a variety of different forms — from time-series of crop health for our data scientists modeling crop yield or farmer practices, to the satellite imagery we show to farmers and agronomists within our products.

This post discusses an engineering advance we made to tackle the seemingly simple task of displaying agriculture-focused satellite imagery over the web: a dynamic image tile server we call nautilus. Before we created nautilus, we followed the usual approach of pre-generating all of the imagery we host. As you’ll see below, the new dynamic tiling system unlocks some capabilities we need to provide better imagery for agriculture, and saves us a pile of money to boot.

For the hasty, the two big gains from switching from static (pre-generated) to dynamic (“just-in-time”) tiling:

Greatly reduced data storage and compute costs allow us to provide more imagery without blowing up our budget
On-the-fly image styling (e.g., colormaps, stretches, masking, etc), so that we no longer have to pick a one-size-fits-all style

Example: US crop health at 10-meter resolution

To focus our discussion, let’s take one of the satellite image layers we use at Indigo as an example.

The visible spectrum is not ideal for assessing plant health. Most satellites collect data from a wider spectrum than just the red, green, and blue bands we are used to. Measurements of reflectance in these wavelength bands can be used to calculate various “indexes” whose value better represents that state of vegetation in our farmers’ fields. Here let’s use classic “Normalized Difference Vegetation Index” or NDVI, which is calculated per-pixel as (near_infrared — red) / (near_infrared + red). To visualize the resulting array, we typically colorize (i.e., map data values to RGBA colors) to draw out important details.

We can compute NDVI from any image that has both a near-infrared and a red band. Here let’s use the Sentinel 2 constellation, which is available at a 10 m2 resolution — fine enough to see details within a typical US farm field. Sentinel 2 imagery is available as a “grid” of partially-overlapping images in UTM coordinates. There are 987 images in this grid that intersect the continental US, and they are delivered in 10 distinct UTM coordinate systems. This grid makes sense for small-scale analyses for which UTM is a very friendly coordinate system, but the overlaps and differing coordinate systems make it tricky to work with large areas efficiently.

***Figure 1:*** Footprints for the 987 images in the Sentinel 2 “grid” for the continental US. Notice the unsavory overlaps, especially where the grid transitions to a different coordinate system (i.e., from one UTM zone to another).

After we compute NDVI from the raw imagery, we have 987 images, each 10980 x 10980 pixels and weighing in at ~100 MB. The good news is that the data are stunning and quite useful for tracking the progress of the growing season. The bad news is there are an awful lot of pixels — about 100 GB for those keeping track — and the images are not aligned where the Sentinel 2 grid had overlaps and different coordinate systems. To display these data in a typical web browser, we need pre-process to reduce data volume and get the imagery in a single well-known coordinate system.

***Figure 2:*** Normalized Difference Vegetation Index (NDVI) mosaic for the continental US for June 2019 to June 2019 — one of the imagery layers we make at Indigo. Each date shown here represents ~100 GB of Sentinel 2 imagery.

About XYZ tiles

Serving these images directly is obviously out of the question: there is too much data and too much geospatial manipulation needed for clients to display it correctly. The de facto standard approach is to serve the imagery in chunks — small image tiles defined on a simple hierarchical grid of “xyz-tiles.”

XYZ-tiles work by organizing the world into a regular hierarchical grid composed of nested image tiles (Figure 3). The number of tiles increases exponentially as we zoom in — each tile at a given “zoom level” is divided into 4 tiles at the next zoom level. Since all the images have the same dimensions (typically 256 x 256 pixels), each zoom level has four times as many pixels as the previous. The point of this approach is that a viewer only fetches tiles it needs to display, and only at a sane resolution for the current view. We can’t display “all the pixels” on a computer monitor when zoomed out to view the whole continental US, so we don’t need to download them.

***Figure 3:*** Illustration the xyz-tile hierarchical grid at a few zoom levels. The top row shows tile boundaries for the world, and the bottom row shows a single image tile taken from that grid (the shaded tile). Thanks to Robert Simmons for creating this lovely figure in his *blog post* (*CC BY-SA 3.0*).

We have to perform various data transformations to make xyz-tiles from our source data. Satellite imagery data are optimized for scientific research, not for visualization, and so tend to be delivered in unusual formats and coordinate systems. The xyz-tile grid is most definitely not suitable for scientific analysis, so the data are never provided as xyz-tiles or anything like them. We need to transform the data from the original sensor-specific image “grid” (usually with gaps, overlaps, and mismatched coordinate systems, e.g., Figure 1) to the neat-and-tidy xyz-tile grid, and apply any image styling we need to make it easy to interpret.

More specifically, the pipeline to generate a given xyz-tile from a set of (georeferenced) source images has (at least) the following steps:

Source Imagery to XYZ-tile Pipeline
1. Identify the source images that overlap the xyz-tile footprint.
2. Resample from the all source images that intersect the tile, i.e., estimate the value of the source images at the location of each pixel in the xyz-tile by transforming the images coordinates and interpolating.
3. Combine all resampled data to populate the tile image as completely as possible.
4. Rescale (stretch) the data for visualization and cast to uint8. Many stretches are possible, for example, linear scaling between predetermined minimum and maximum values.
5. Colorize: For single-band imagery like vegetation indexes or meteorological data, apply a colormap transforming the scaled data to (R, G, B, A) tuples. For multi-band imagery, stack the separate bands to create a 3-band RGB image
6. Apply “masks” to remove unwanted features, e.g., clouds or water
7. Encode the resulting RGBA array as a PNG

The key difference between our new and old systems is not what processing we do, but rather when we do it: the old system pre-generated all of the xyz-tiles before serving them to users, whereas the new system renders tiles from the original sources on demand.

Life before nautilus

Making a basic “source-imagery-to-XYZ-tile-pipeline” is relatively straightforward thanks to widely available open source geospatial libraries. For smaller tasks, a script tying together standard command-line tools (e.g., gdalbuildvrt, gdalwarp, and gdal2tiles) can do the job.

We started out with such an approach in our early years, and had good success in processing coarse-resolution (~500x500 m pixels) imagery. For each data product and date, we simply pre-generated all of the xyz-tiles and uploaded them to S3 with a nice descriptive prefix. We then set up our UI to access the tiles directly from S3 (with a little extra work for authentication and authorization).

Moving from coarse resolution imagery to so-called “medium resolution” imagery (e.g., ~10x10 m pixels as in our Sentinel 2 example) breaks everything — not surprising given the change in data volume. For coarse-resolution imagery, we need only 2610 xyz-tiles to cover the continental US including all zoom levels up to the original data resolution (i.e., zooms 0–8). For medium-resolution imagery, we need ~111 million xyz-tiles to do the same at the higher image resolution (i.e., zooms 0–15). This change amounts to a 10,000x increase in the data we need to generate and store (Figure 3, blue line). Exponential scaling takes no prisoners.

***Figure 3:*** Number of tiles as a function of zoom level. The blue line show the tile count for the continental US. This is the number of tiles needed if all of the xyz-tiles are pre-generated. The exponential increase in tile count with zoom level means the high zoom levels account for the vast majority of tiles. In contrast, the red line shows the average number of tiles accessed by a user/viewer zooming in on a single location. In this case, the number of tiles needed is largely insensitive to the zoom level. Dotted lines indicate a few useful values: the number of tiles in the continental US at zoom 8, and at zoom 15, and the number of tiles visible in a user’s zoomed-in viewer.

This huge increase in data volume presents challenges for both compute and storage. It is still possible to pre-generate static tiles, but it is a lot less attractive.

To “stay in the game”, we reworked our original pipeline to run in parallel across many nodes on AWS EC2. Scaling out allowed us to keep the time needed to generate tiles sane. In fact, we ended up bumping up against S3 upload rate limits — we were generating tiles faster than we could stuff them into S3.

Increased costs, however, were unavoidable. To be brief: more tiles, more compute cycles, more storage. With this many tiles, even the seemingly negligible cost of uploading objects to S3 becomes significant. Considering just storage, a single imagery layer cost roughly $225 one-time cost for uploads and 95$/mo for storage. This cost adds up fast if you want to have many, many layers, which we most certainly do.

So while we were able to serve medium resolution imagery, static tile generation now imposed some unacceptable constraints on us:

Limited to relatively few imagery layers
Unable to adjust image styling at all, let alone dynamically. The processing decisions made when generating static tiles are permanent. The colormap, stretch, and data masks (e.g., water or cloud) are “baked-in” to the static image. Changing any of these requires re-generating all of the image tiles for every date to keep the imagery self-consistent.

Whereas static tiling originally made things easier and more efficient, this was clearly no longer the case once we switched to medium resolution imagery.

Dynamic tile rendering: nautilus

Our solution is a dynamic tiling server we call nautilus. The key feature is that nautilus generates xyz-tile images on-the-fly directly from the source imagery as they are requested by viewers. It may not be immediately apparent why this is a good idea. After all, we now have a lot of work to do to respond to a requested tile (everything in the “source-imagery-to-XYZ-tile-pipeline” above), and before we just sent some pre-generated bits. Provided we can make run it fast enough (see Figure 5), there are two big benefits to a dynamic tiling system.

First, the difference between the (huge) total number of tiles and the (tiny) number of tiles accessed by a given user means we don’t actually need most of the tiles in a full static tile set. The beauty of the hierarchical xyz-tile grid is that a viewer displays the same number of pixels (and roughly the same number of image tiles) no matter how far zoomed in or out. On my laptop, a viewer zooming in on a single area of interest typically requests ~400 tiles total (Red line in Figure 3). A complete set of xyz-tiles requires about a million times more images than a typical user requests! Most customers will see the low zoom tiles at the scale of countries and continents, but very few people are likely to look at more than a handful of the millions of high resolution tiles. By generating only the tiles we need, we reduce the initial cost of hosting an imagery layer dramatically, while still meeting our customers needs.

Second, if we generate a tile image upon request, we can delay some of the image processing decisions that used to be “baked-in”, and give the viewer more control. When requesting a tile from nautilus, the viewer can modify the colormap, stretch, and data masking simply by setting optional URL query parameters. This may seem like a bit of a gimmick, but adjusting visualization can highlight interesting features within agricultural fields that would otherwise go unrecognized, and help people with vision issues (e.g., red-green colorblindness) to better interpret the data.

***Figure 4:*** Adjusting image styling reveals details in different parts of a highly-variable region. In this example, we see a NDVI for a few large fields somewhere in Kansas on in July 2020. The irrigated center-pivot field has a much higher NDVI than the surrounding area. Changing the “stretch” (i.e., the mapping between the data and the colormap) allows us to view details either in the field or its surroundings. The left and right panes show two different colormaps.

Dynamic tiling makes use of the (relatively) new Cloud Optimized GeoTIFF (COG) image format. COGs are basically just ordinary GeoTIFFs structured to enable efficient partial reads. To accomplish this, COGs store imagery data tiled into contiguous blocks and include a nicely formatted header (the “Image File Directory”) so we can lookup what data to read very quickly. Reading a subset of a traditional GeoTIFF remotely might require many HTTPS GET requests to find and read the data. COGs ensure that there is a minimum amount of network communication, making requests fast and efficient. While not a required part of the specification, COGs typically also store pre-computed “overviews” of the original resolution data that can be read from on-the-fly if we don’t need full resolution detail. Perhaps its best feature, COGs are easy to begin using because they “just work” with existing tools and solutions that already use GeoTIFFs because they are GeoTIFFs. For more in-depth introductions, check out Cog Talk Part 1, A Handy Introduction to Cloud Optimized GeoTIFFs, and Do You Really Want People Using Your Data? Life is better with COGs!

With source imagery in COG format we can read from it quickly enough to support dynamic tiling, but this is not the only operation that we need to do (remember the “source-imagery-to-XYZ-tile-pipeline”?). The big question during development was whether or not we would need to do some of this processing ahead of time to keep the UI snappy. Dynamic tiling is not an all-or-nothing bet, but our ideal design was to read the original source data directly, without altering any of the characteristics that make it suitable for scientific analysis. In other words, we wanted to avoid the need to duplicate data and use the same imagery to support both visualization and analytical use cases.

After experimenting with pre-computing certain steps (e.g., reprojection, cloud or land masking, etc.), we were pleasantly surprised to find that our dynamic tiling server performed fast enough to do “all the things” (Figure 5). Accessed from a home internet connection (as our users would experience it), nautilus renders and returns tiles in ~100–500 ms. This is of course slower than static tiles, which are returned in ~50–200 ms, but it is still pretty snappy. A simple caching layer speeds things up, so that performance matches the static tiles for repeatedly viewed areas.

We found one exception: pre-processing was needed for low zoom levels. In this case, reading from the original images is a little silly. For our NDVI example, we would be reading from 987 COGs to generate a zoomed-out view of the US, and getting just a few pixels from each source. In testing, we found that request times became unacceptably slow as we zoomed out, and so treat low zooms separately to maintain the performance we need.

***Figure 5:*** Distribution of tile request response time (i.e., how long it takes to receive an image after sending the HTTPS GET request). Here we show three types of requests: requests for static tiles (orange line, the old pre-generated tiling system), requests for brand-new dynamic tiles (blue line, new tiles generated by the nautilus dynamic tiling server), requests for previously-viewed dynamic tiles (green line, these tiles are served from our cache). New dynamic tiles are acceptably fast on first view, and effectively the same as the old static tiling system when viewed again.

***Figure 6:*** Tile request response time as a function of zoom level for static, dynamic, and dynamic-cached requests. Dynamic tile generation is slower as we zoom out, since more images must be processed. We address this problem by doing some pre-processing for lower zoom levels, and by caching repeatedly-viewed tiles.

One reason we can be so speedy (in Python, no less) is the GDAL/rasterio and the scientific Python stack. We owe a lot of gratitude especially to the people involved in libraries like NumPy and rasterio, and to inspiring work on dynamic tiling in other projects like Marblecutter, rio-tiler and cogeo-mosaic, and terracotta.

Life after nautilus

Switching to dynamic tile generation with Nautilus has cleared the way for us to add (many) more imagery layers, and to give a fresh look to many of the imagery layers we had been showing. We are working now on creating the new layers, and on exposing the new dynamic styling features in our user-facing applications.

Looking toward the future, nautilus has gotten us up to speed with one of the most exciting developments in the geospatial community since the development of COGs: data consolidation. Following the example of the opening of the Landsat archive, Earth observation data is best utilized if it is made free and openly available. Space agencies that have made free and open data policies a priority, like the Sentinel program, have seen the wide-scale adoption and use of their data. Free and open data does not resolve the problems of distribution and storage, but we are very encouraged by the trend in making these data available on cloud platforms as COGs at only the cost of accessing them (e.g., “requester pays”). For example, it is much easier for us to use data from the National Aerial Imagery Program (NAIP) if we access it on AWS instead of downloading from the USDA. Many of us share the goal of having a more sustainable planet, and sharing these consolidated Earth observation archives will help us get better insights from these faster and with lower footprint.