COG Talk — Part 2: Mosaics

This blog is the second in a series called COG Talk, which looks at ways to use Cloud Optimized GeoTIFF, and why we use them.

Vincent Sarago
Development Seed

--

The first post is a refresh on the COG format and announces the release of version 1.0.0 of rio-tiler and rio-cogeo. Here, we’ll see how we can use them to build mosaics for web maps.

Multiple high resolution Cloud Optimized GeoTIFF hosted on OpenAerialMap (link)

COG vs Map Tiles

Cloud Optimized GeoTIFF files, as the name implies, are specifically designed for easily accessing remote raster data. Because of the internal tiling and internal overviews, people often ask: can COGs replace map tiles? The usual response is: yes, but

Cloud Optimized GeoTIFF can replace .mbtiles or statically generated map tiles by using a proxy to render tiles dynamically (e.g lambda tiler). But when it comes to large datasets, the GeoTIFF files become too big and lose the advantage of fast remote reads.

There is no size limit for GeoTIFF and when working with a country or worldwide dataset, COGs can get quite large. Even using compression to create a reasonably sized file, it’s likely that the resulting GeoTIFF header — used to look up internal data tile locations — will be very large and will slow down the dynamic tiling processes.

If we instead decide to read a large collection of files and create a mosaic, we have a new set of issues: how can we decide which pixel to display given overlapping tiles? How can we update this pixel choice decision if we have daily updated data?

To help solve those problems, we are releasing rio-tiler-mosaic, a rio-tiler plugin allowing Mercator tile creation from multiple observations, and the associated mosaicJSON specification.

rio-tiler-mosaic

Creating a mosaic, in its simplest form, involves choosing pixels from multiple images to create a single image. To provide a dynamic map tile endpoint, we’ll need to repeat this process for each individual Web-Mercator tile. rio-tiler-mosaic provides two important methods to make this simple: pixel selection for merging the tile arrays together and smart multi-threading for quickly managing a large number of images.

Pixel selection

Creating map tiles using COGs means we are dynamically generating the image array in response to a tile request. When working with mosaics, it also means we need to choose which pixel we display when we have an overlapping dataset. For a given tile, we iterate over all intersecting input images to decide which pixel to choose from. By default, rio-tiler-mosaic provides four different pixel selections rules:

  • First: take the pixel in the first matching image and return when the tile is full
  • Brightest: loop through all images and return the highest pixel value
  • Darkest: loop through all images and return the lowest pixel value
  • Last: take the pixel in the last matching image and return when the tile is full
Pixel selection methods applied on Landsat-8 NDVI values for all 2018 observations over Montreal area.

Smart multi-threading

For each pixel selection method, we need to either process the whole stack of images or just the first few until the array is full. We are using multi-threading to fetch and read data in parallel to speed up the process. Because the First and Last pixel-selection methods should return as soon as the tile is totally filled, we implemented a partial multi-threading approach by processing chunks of assets in parallel instead of the full list (see code). This is particularly handy when the list of assets is long.

Usage

The mosaic tile handler is designed to roughly match rio-tiler’s tile handlers and returns tile and mask arrays. Here is an example of generating a mosaic tile.

rio-tiler-mosaic takes a list of image files (assets), a tiler handler(customor from rio-tiler) and Web Mercator XYZ indices as input.

We have published a first beta version of rio-tiler-mosaic on Pypi and the source code is available on Github.

mosaicJSON

In addition to rio-tiler-mosaic, today we are releasing a specification for representing Web Mercator mosaics constructed from multiple observations. The mosaicJSON specification can be seen as the GDAL VirtualRaster ( VRT), but for indexing files by Web Mercator quadkeys. Quadkeys (a contraction of quadtree and key) are "one-dimensional strings" representing unique Z-X-Y (level-row-column) Web Mercator map tiles.

COG footprint and Quadkey index

The goal of the metadata specification is to provide a simple spatial index linking the COGs to the XYZ Web Mercator map tile to render. Note that in the rio-tiler-mosaic examples above, we assume that the list of input image assets for a given tile is already "known": the mosaicJSON file can provide that input.

The most important requirement is that each quadkey has a zoom level equal to the mosaicJSON minzoom. We use this to calculate the parent tile for a given input tile between the mosaic’s minzoom and maxzoom.

While this is still a Work in Progress the main features are:

  • quadkey based file index
  • simple JSON format (enabling high ratio compression)

Specification

https://github.com/developmentseed/mosaicjson-spec/blob/master/0.0.1/Readme.md

A complete example of a mosaic definition based on mosaicJSON specification can be found here.

Example of implementation

Here is a simple implementation of mosaic tiling using the specification. On each tiler's call, our handler ( my_mosaic_handler) function fetches the mosaic file and looks for the assets indexed by the Web Mercator parent tile (at minzoom level) for the input XYZ tile. We then use the rio-tiler-mosaic to combine the different image assets into one tile.

mosaic handler: get asset list for a specific ZXY tile request.

Creating a mosaicJSON definition

Now that we know how to use it, let’s create a new mosaicJSON definition. Let’s say we have 28 files (e.g. Facebook population density) covering almost the whole continent of Africa.

High-resolution population density maps COGs from Facebook AI (link)

Quadkey base zoom (or mosaic min-zoom):

To construct the spatial index we need to define a minimal set of quadkeys. By definition, all COGs intersect with the tile 0-0-0 (zoom 0) but we can do a little math to minimize the set further. Inspecting the input TIFs, we see that the native resolution is around 0.00027 degrees with 8 levels of overviews. The native resolution corresponds to Web Mercator zoom level 12 so we can guess that we should start tiling at minzoom = 4 (12 - 8 = 4). We'll use this as the base zoom for quadkey keys.

# Create Footprint$ parallel -j4 rio bounds ::: $(aws s3 ls opendata.remotepixel.ca/facebook/ --recursive | grep ".tif" | awk '{print "s3://opendata.remotepixel.ca/"$NF}') | jq -c '.features[0]' | fio collect > facebook.geojson# Find ourZoom 4 quadkeys(there are 13) $ cat facebook.geojson | supermercado burn 4 | mercantile quadkey | paste -s -d"," -0330,0331,1220,1221,0333,1222,1223,3000,3001,3010,3002,3003,3012
Zoom 4 Mercator tiles intersecting with the facebook population dataset.

The mosaic definition is then built by finding the COGs intersecting with each of the 13 zoom 4 tiles:

Python code sample showing how to create a mosaic definition file.

cogeo-mosaic: a CLI and a Serverless stack to create and use mosaicJSON

https://github.com/developmentseed/cogeo-mosaic

Wrapping up this new specification and the new rio-tiler-mosaic plugin we are also releasing cogeo-mosaic, a Serverless stack (based on AWS Lambda) to create and use the mosaicJSON specification. You can also install cogeo-mosaic locally and use the built-in CLI to create mosaicJSON from a set of files.

$ pip install https://github.com/developmentseed/cogeo-mosaic$ cogeo-mosaic create my_list_of_files.txt -o mosaic.json

Is mosaicJSON appropriate for all mosaic map tile problems?

Obviously not, but we have been testing this solution on different projects and we find it simple and fast in most cases. That said, here are the pro/cons:

Pro

  • Less filesto manage: In the case of the Facebook dataset, we ended up having 29 files stored in the cloud instead of ~600 000 if we created all the Mercator tiles image files.
# Get number of tiles from zoom 4 to zoom 12 using https://github.com/mapbox/supermercado/pull/26$ cat facebook.geojson | supermercado burn 4..12 | sort | uniq | wc -l  613 456
  • Flexibility: When we create the tile, we have the ability to decide whichpixel should have priority and be rendered on top.
  • mosaicJSON files can be relatively small (especiallyif using compression) and can be cached to increase performance.

Cons

  • Because we need a dynamic tiler, creating a tile on the fly will always be slightly slower than having the tile already ready to be served to the client.
  • Tile creation can take several seconds when using darkest/brightest methods (becauseit has to read all the COGs).
  • Only supports theWeb Mercator projection (as for rio-tiler).
  • Generating mosaicJSON files can be slowfor large areas with many images.

Community

Both the mosaicJSON specification and rio-tiler-mosaic are published on Github and we welcome any feedback and/or contributors. Please feel free to ping me on Twitter @_VincentS_ if you have questions or want to hear more about the work we are doing to make open data more open and easier to use.

Demo

We built a simple demo page where you can explore some mosaicJSON examples and even share your own: https://bl.ocks.org/vincentsarago/raw/815884188c243b636ab8d927d8942a4d/

Hundreds of mercator tiles created dynamically based on mosaicJSON definition.

--

--

Vincent Sarago
Development Seed

Making COG at @DevelopmentSeed & Creator of @RemotePixel