A Handy Introduction to Cloud Optimized GeoTIFFs

Since I started working at Planet, I kept hearing about Cloud Optimized GeoTIFFs (COGs). My colleague Chris Holmes even wrote a post explaining what they are and how they’re useful. I decided to implement a Go library to be able to work with COGs. Much of Planet’s satellite imagery is in the GeoTIFF format, including this nice image of Los Angeles from our February 2018 Global Basemap:

An image of Los Angeles from Planet’s February 2018 Basemap. ©2018 Planet Labs Inc, CC BY-SA 4.0.

As a first step to writing a library, I needed to learn about 1) Cloud Optimized GeoTIFFs, 2) GeoTIFFs in general, and 3) even more general than that: TIFFs. I’ve always found binary file formats intimidating — they just look like gibberish when you dump them into the console! But there’s logic in that gibberish. Here’s what you can gain from my exploration of this 30-year-old file format.

What Even Is a TIFF?

My first experience with TIFF was from the scanner I had as a kid. I scanned my bad manga-style drawings and it produced TIFFs, which the scanner software in Windows ‘95 told me meant “Tagged Image File Format.” What exactly does that mean? Let’s take a look at the Tag Image File Format FAQ to find out.

The TIFF header — the first few bytes of the file — gives us a lot of useful information:

             +-------+-------------+-----------------+
Byte Offset: | 0 | 2 | 4 |
+---------------------------------------+
Size: | Word | Word | Long |
+---------------------------------------+
Content: | Byte | Version (42)| Offset to first |
| Order | | IFD |
+-------+-------------+-----------------+

Let’s ignore IFD for now and start with something that’s easier to understand. We can see that Word is two bytes and Long is four bytes. Byte-order tells us if the file is big-endian (MM) or little-endian (II). The TIFF Version is 42, which the TIFF specification is an “arbitrary but carefully chosen number.”

How can we view a TIFF to confirm our understanding of the file header so far? I recommend the very nice xxd tool, which prints out binary files in a readable way. I like xxd because the terse name makes me feel like a hacker. Let’s download the TIFF of Los Angeles that I showed above, and dump the first few bytes of the TIFF:

$ curl -O https://oldpatricka.com/planet/la.tif
$ xxd la.tif | head -n 1
00000000: 4949 2a00 0800 0000 1200 0001 0300 0100 II*.............

The first two bytes of this file are 4949, and xxd helpfully tells us this is II in ASCII. Remember that II is little-endian, which means the bytes are stored in least-to-most-significant order. (This is also the way that sensible Canadians like me prefer to write dates.) Next, we can confirm that it’s actually a TIFF. The next two bytes are 2a00. We can read this as the hex number 0x002A, which is 42 in base ten! Cheers — we just read the file type and byte order of a TIFF without any special tools!

Next let’s learn about IFDs.

IFD: the Image File Directory

TIFFs are divided up into “pages,” which are individual images within a TIFF. A TIFF isn’t actually a single image, but rather a container for a number of images! The Image File Directory (IFD) describes one of these pages, and includes a set of tags to accomplish this. The tags include metadata about the file as well as where to find the image bits themselves in the file. There is a good reference for tags on the Aware Systems website, and the TIFF Specification is a very good reference and easy to read too!

So what does an IFD look like?

             +---------+-------------------+-----------------+
Byte Offset: | 0 | 2 | 2 + n Tags * 12 |
+-----------------------------------------------+
Size: | Word | 12 Bytes * n Tags | Unsigned Long |
+-----------------------------------------------+
Content: | Number | Tag Data | Offset to Next |
| of Tags | | IFD (or 0) |
+---------+-------------------+-----------------+

The overall structure includes the number of tags, the tag data itself, then a pointer (byte offset) to the next IFD or 0 if there are no more IFDs.

So what does an individual tag look like? Here’s a picture:

        +--------+--------------+----------------+-----------------+
Byte | 0 | 2 | 4 | 8 |
Offset: | | | | |
+----------------------------------------------------------+
Size: | Word | Word | Unsigned Long | Variable |
+----------------------------------------------------------+
Content:| Tag ID | Tag Datatype | Number of | Tag Data or |
| | | Values | Pointer to Data |
+--------+--------------+----------------+-----------------+

So we can see an individual tag has the ID of the tag, the data type of the tag data, the number of values, and the tag data itself. Now let’s try to read a tag.

Back to the TIFF header:

$ xxd la.tif | head -n1
00000000: 4949 2a00 0800 0000 1200 0001 0300 0100 II*.............

We have the byte order, the TIFF version, and the byte offset to the IFD. Let’s seek straight to the IFD offset in the TIFF header and only show that offset so it’s not overwhelming:

$ xxd -s 4 -l 4 la.tif
00000004: 0800 0000

These bytes (0800 0000) are 0x8 in hex, which is 8 in decimal too. This means that the first IFD is at offset 8, which is right after the header. So, let’s seek to that offset and read the number of tags in the IFD by showing the first two bytes:

$ xxd -s 8 -l 2 la.tif
00000008: 1200

These bytes (1200) mean 0x12 in hex or 16 tags in the IFD. Great, now let’s read one of these tags! To do that, we can add 2 to the offset and try to read the first tag. Remember that a tag is 12 bytes, so let’s read 12 bytes:

$ xxd -s 10 -l 12 la.tif
0000000a: 0001 0300 0100 0000 0010 0000

So the tag ID is the first two bytes (0001) or 0x100 in hex, which is 256 in decimal. The tag reference says ID 256 is the ImageWidth. Now we need to know the data type to be able to read the value. This is stored in the next two bytes (0300) or 0x0003 in hex, which is 3 in decimal. The TIFF spec says 3 is a short, which is a two byte integer. Now we can check how many of these values there are with the next 4 bytes (0100 0000), which is 0x1 in hex or 1 in decimal. So, one ImageWidth value. That makes sense. Now, let’s read that value itself. It is a short, so we read the next two bytes (0010), which is 0x1000 in hex or 4096 in decimal. Now we’ve learned that the first image in our TIFF has a width of 4096 pixels. Cool! Again, we aren’t even using any software that knows about TIFFs.

Now, I’m not going to read the rest of the tags this way or the other IFDs (Ian Hansen and I wrote a script that dumps the IFDs in Go), but now we now have a good understanding of how a TIFF is structured and how to read them, all just using the command line and TIFF references.

What About the Image Data?

So all this metadata is interesting, but aren’t TIFFs images? Where is the image data itself? Well, a few special tags describe where to find the data in the file. In the case of COGs, you can use TileWidth, TileLength, TileOffsets, and TileByteCounts to read the image data since it happens to be tiled. You might have something like:

TileWidth: 256, TileLength: 256, TileOffsets: [100, 10000, 20000, 30000], TileByteCounts: [9900, 10000, 10000, 9000]. This tells you all you need to know to read out the bytes of a tiled portion of an image.

What is a GeoTIFF, Then?

A GeoTIFF is just a TIFF that has special tags to allow you to georeference the image. This tell you where on Earth is represented by the image. There are a lot of tags, but they let you know about the bounds of the image with respect to Earth, the projection it uses, and many other things.

This is easy to understand now that we have a good understanding of TIFFs. GeoTIFFs are just like what we were looking at above, but with a specific tag set.

So What Makes a Cloud Optimized GeoTIFF (COG) Interesting?

Now we know what a TIFFs and GeoTIFFs are, what is a COG? If we check out the specification, it’s pretty easy to understand, now that we understand TIFFs.

COGs are just the following stored inside the file in exactly this order:

A TIFF with one or more IFDs (describing the original image and zoomed out versions). The data for the images described isn’t that exactly the same as a regular TIFF? No actually! We didn’t cover this above, but aside from the header, the different parts of a TIFF can be in any order in the file, and TIFF readers just need to follow the offsets to read through a file. A TIFF could be structured like either of these:

Nice COG:             Annoying TIFF:
+-------------+       +-------------+
| TIFF Header | | TIFF Header |
+-------------+ +-------------+
| IFD 0 | | Image Data |
+-------------+ | for IFD 1 |
| IFD 1 | | |
+-------------+ | |
| IFD 2 | +-------------+
+-------------+ | IFD 2 |
| Image Data | +-------------+
| for IFD 2 | | IFD 0 |
| | +-------------+
| | | Image Data |
+-------------+ | for IFD 0 |
| Image Data | | |
| for IFD 1 | | |
| | +-------------+
| | | IFD 1 |
+-------------+ +-------------+
| Image Data | | Image Data |
| for IFD 0 | | for IFD 2 |
| | | |
| | | |
+-------------+ +-------------+

If it’s structured like the one on the left, the benefit is that we can download smaller chunks of a TIFF at a time, and don’t always need to download the whole file if we only want a little chunk. We could do that by using an HTTP Range Request to ask for the specific bits of the TIFF we’re interested in. For example, we could download just the header and the IFDs with something like:

$ curl http://example.com/example.tiff -i -H "Range: bytes=0-1023"

We’re just guessing that the IFDs fit into 1K, but now that we’ve retrieved this, we can use the IFDs to fetch individual tiles without fetching the whole TIFF. Say we find that the TileOffsets are: [2048, 52048, 102048, 152048] and the TileByteCounts: [50000, 50000, 50000, 50000], we can fetch just the first tile with:

$ curl http://example.com/example.tiff -i -H "Range: bytes=2048-52047"

Neat, huh?

So why would you want to do this? Say you are running a tile server and want to render tiles for display on the web on-the-fly, you can split up your workload across multiple hosts and only pull chunks of the TIFF to render. Or say you want to run some analysis on your imagery that can be split up into small chunks, you can now grab small chunks of the file.

COGs Aren’t so Complicated!

We now understand how TIFFs, GeoTIFFs, and COGs work! There’s a lot here, but if you go slowly and build on your knowledge as you go, it’s not too hard to understand. Of course, I didn’t go over some details, like how the image data can be stored in strips or tiled, compressed into different formats, or the pixels themselves interleaved in different ways!

If you’d like to learn more about COGs, there is a great reference at cogeo.org, and you can view COGs in your browser with the cog map project. If you want more information about Planet APIs to get images like the one above, check out the Developer Center.

To close, I’d like to thank some of my colleagues at Planet: Ian Hansen for working with me to understand COGs; Joe Kington for filling in a lot of blanks about how the mosaics team uses COGs; Chris Holmes for encouraging me to share this outside of Planet; Dana Bauer for a ton of help reviewing this post; and Frank Warmerdam for being available as a resource to learn about GeoTIFFs and COGs specifically.

If I made any mistakes here, please let me know on Twitter or here on Medium! I’d love to learn more about TIFFs!