Intensely-colored wildflowers covered the Carrizo Plain in the spring of 2017. Photograph ©2017 Steve Corey. cc by-nc-nd 2.0.

A Gentle Introduction to GDAL Part 4: Working with Satellite Data

Maps are great for plotting a route, finding yourself when you’re lost, exploring a distant land, or discovering relationships between areas, but they’re more-or-less by definition a static, idealized depiction of place. What happens if you want to see the surface of the Earth in near-real time, watch the seasons change, or study a long-term trend?

Launch a satellite! Or, failing that, download some data from someone who has.

A very wet winter brought an incredible blanket of flowers to California in the spring of 2017. These Planet RapidEye images show how the flowers transformed Carrizo Plain National Monument. (Lindsay Hoshaw of KQED found these data in the Planet archive & wrote about the super bloom — thanks!) ©2017 Planet Labs Inc., cc-by-sa 4.0.

The images above, from Planet’s RapidEye constellation, demonstrate one of satellite data’s greatest strengths: observing change. Orbiting robots can monitor the Earth 24/7 (although this depends on the orbit, there are many options), 365.25 days a year. They also have super-powers, like being able to eat sunlight, infra-vision, perfect memory, and virtual time-travel (some weather satellite datasets go back to the early ’60s.) This allows unparalleled study of seasonal and long-term change.

On the flip side, they can also be inscrutable, like an Alan Moore superhero. The data are rarely packaged neatly, and even when they are it can take some work to make a usable image (or set things up for processing by a computer). Which is where GDAL comes in. Again.

A typical workflow to create an image from raw satellite data would be:

  1. Download data.
  2. Re-order or assemble bands into the desired order (red, green, blue; or near-infrared, red, green; etc.)
  3. Increase the resolution with pan-sharpening, if desired.
  4. Contrast-stretch and color-correct the imagery, either algorithmically or by hand.
  5. Restore georeferencing information, if necessary.
  6. Crop, re-project, and re-size image to merge with other data.

Here’s how to do it.

Using gdal_translate to Re-order Bands

Let’s start with the data that went into the March 31 image of the Carrizo Plain. Download a single scene here, or sign up for a Planet Explorer account. (It’s free, and you’ll be able to access PlanetScope and RapidEye data from California. Navigate to Carrizo Plain National Monument (about -119.725, 35.120) and download the RapidEye ortho tile with this ID: 20170331_190720_1155205_RapidEye-3 and its neighbors.) Now open the file 1155205_2017–03–31_RE3_3A.tif with your TIFF viewer of choice.

Blue, green, red RapidEye image of the Carrizo Plain. ©2017 Planet Labs Inc., cc-by-sa 4.0.

Oh. That doesn’t look right.

Fire up gdalinfo to figure out why.

gdalinfo 1155205_2017–03–31_RE3_3A.tif

Which gives a block ’o text, ending with:

Band 1 Block=5000x1 Type=UInt16, ColorInterp=Red
NoData Value=0
Band 2 Block=5000x1 Type=UInt16, ColorInterp=Green
NoData Value=0
Band 3 Block=5000x1 Type=UInt16, ColorInterp=Blue
NoData Value=0
Band 4 Block=5000x1 Type=UInt16, ColorInterp=Undefined
NoData Value=0
Band 5 Block=5000x1 Type=UInt16, ColorInterp=Undefined
NoData Value=0

There are 5 bands! RapidEye detects light in 5 separate wavelengths: blue, green, red, red-edge, and infrared (full image specs are in the Planet Imagery Product Specification (PDF)). Each is assigned as a band in the image. We can see the first 3 (the additive color primaries!), kinda see red-edge, and decidedly can’t see near infrared. Read Why is that Forest Red and that Cloud Blue? (which I helped create in my previous gig at NASA) to learn more about what these extra wavelengths (and more) are for.

You may have noticed I listed the bands as blue, green, red… That’s because they’re ordered from short (blue) to long (infrared) wavelengths. It’s just a convension in the remote sensing community, although there are exceptions (*cough* Landsat 4, 5, & 7 *cough*). They really shouldn’t be displayed as an RGB image at all, but 5 separate grayscale channels. To reorder the bands and view the image correctly, run gdal_translate with a few new options:

gdal_translate 1155205_2017-03-31_RE3_3A.tif 1155205_2017-03-31_RE3_3A_rgb.tif -b 3 -b 2 -b 1 -co COMPRESS=DEFLATE -co PHOTOMETRIC=RGB

Adding the option -b 3 -b 2 -b 1 assigns band 3 to blue, band 2 to green, and band 3 to red. -co COMPRESS=DEFLATE enables an efficient lossless compression scheme to keep the file size from getting outrageous, and -co PHOTOMETRIC=RGB ensures any image viewer will display the bands as red, green, blue (instead of an alpha channel or something).

Unstretched red, green, blue RapidEye image. ©2017 Planet Labs Inc., cc-by-sa 4.0.

Better! (No really, it’s better.) But still dark. Time for gdalinfo again, but this time with -mm to compute image statistics and see why it’s so dark.

gdalinfo -mm 1155205_2017-03-31_RE3_3A_rgb.tif

Which should spit out:

Band 1 Block=5000x1 Type=UInt16, ColorInterp=Blue
Computed Min/Max=1422.000,41645.000
NoData Value=0
Band 2 Block=5000x1 Type=UInt16, ColorInterp=Green
Computed Min/Max=2108.000,49122.000
NoData Value=0
Band 3 Block=5000x1 Type=UInt16, ColorInterp=Red
Computed Min/Max=3482.000,49572.000
NoData Value=0

There’s a few important things in here: Type=UInt16 and the min/max for each band. Type=UInt16 means the data is in unsigned 16-bit integer format—twice as many bits, but 256 times the information as a normal 8-bit image channel. Theres 65,536 possible values in each band. This gives a lot of flexibility in image processing at the expense of a little bit of storage. The other value of the interest is the Computed Min/Max for each channel which are 14,22–41,645 for blue, 2,108–49,122 for green, and 3,482–49,572 for red. That’s not bad, considering the available range in a 16-bit file is from 0–65,535. So what’s wrong, and how can the image be made better?

Algorithmic Image Enhancement

Since our eyes sense light proportionally, the data still need to be stretched to compensate, despite filling most of the available range. Like so:

gdal_translate 1155205_2017-03-31_RE3_3A_rgb.tif 1155205_2017–03–31_RE3_3A_rgb_scaled.tif -scale 1422 49572 0 65535 -exponent 0.5 -co COMPRESS=DEFLATE -co PHOTOMETRIC=RGB

This does two important things. -scale 1422 49572 0 65535 stretches each band equally from from a range of 1,422–49,572 (first pair of numbers) to a range of 0–65,535 (second pair of numbers). I could have scaled each band separately to its extents (which is an extremely common image processing technique) but that would likely engender a hue shift. Equal scaling leaves things looking more natural. -exponent 0.5 raises each band to the power of ½—the square root. It’s a quick and dirty way to get a preview image.

Histogram stretched and square-root scaled red, green, blue RapidEye image. ©2017 Planet Labs Inc., cc-by-sa 4.0.

As I said, this is quick and dirty, and is going to make every scene look different. Color correction of satellite imagery is a vast topic. I’ve written two guides, one with Landsat 8 and one with PlanetScope data—they’re the process for both works fairly well for RapidEye. Tom Patterson of the National Park Service and Mapbox have both written guides that I recommend, too.

That’s an RGB image—how is the infrared band used? By convention the longest wavelength (furthest in the infrared) replaces the red band, while the other (shorter wavelength) bands are pushes towards blue. A “standard” false-color infrared image is near-infrared=red, red=green, and green=blue. There’s only 3 bands to play with, so blue goes away entirely. Make a false-color image like this:

gdal_translate  1155205_2017-03-31_RE3_3A.tif 1155205_2017-03-31_RE3_3A_nir.tif -b 5 -b 3 -b 2 -scale 1051 49122 0 65535 -exponent 0.5 -co COMPRESS=DEFLATE -co PHOTOMETRIC=RGB
Histogram stretched and square-root scaled near-infrared, red, green RapidEye image. ©2017 Planet Labs Inc., cc-by-sa 4.0.

I’ve done the band re-ordering & scaling with one line of code (having run gdalinfo -mm on the original 5-band file to get the range).

Combining Bands with gdal_merge.py

So far so good. These images show the Carrizo Plain in the relatively recent past (at least at the time I’m writing this)—what did the region look like last year, after a merely rainy (not biblically wet) winter?

Digging through the Open California RapidEye archive you’ll find—nothing. Well, not exactly nothing. Clouds. Not everywhere is as cloudy as Borneo, but clouds are still surprisingly common, with about 2/3 of the Earth’s surface covered at any given time. Fortunately, the RapidEye satellites aren’t the only ones in the sky, and Landsat 8 collected a nice, cloud-free scene of the area just over a year before, on March 25, 2016.

There’s roughly a dozen different Landsat viewers out there (and you can add Planet Explorer to the list), but the most straighforward for retrieving a single scene might be the AWS Landsat archive (if you want to use Earth Explorer, the official USGS access point, I wrote a guide back when it was the only option). Navigate to LC80420362016085LGN00 to download the data from last spring (learn how to decode the scene IDs here).

Wow. That file list is a mess. Instead of having multiple bands in a single file, each band is stored in its own TIFF, and there’s a bunch of other files with cryptic extensions like “IMD” and “OVR”. Just ignore them, and download the TIFFs for band 2 (blue), band 3 (green), band 4 (red) and band 8 (panchromatic).

Combining the separate bands is just as straightforward as re-ordering bands, but uses gdal_merge.py instead of gdal_translate.

gdal_merge.py -o carrizo-20160325-oli-rgb.tif -separate LC80420362016085LGN00_B4.TIF LC80420362016085LGN00_B3.TIF LC80420362016085LGN00_B2.TIF -co PHOTOMETRIC=RGB -co COMPRESS=DEFLATE

The flag -separate followed by the filenames for the bands we want LC80420362016085LGN00_B4.TIF LC80420362016085LGN00_B3.TIF LC80420362016085LGN00_B2.TIF stack the channels in the order they’re written. The rest of the code I’ve described before: gdal_merge.py calls the script and -o carrizo-20160325-oli-rgb.tif creates the output file (be careful: It will overwrite existing files!) The command options -co PHOTOMETRIC=RGB and -co COMPRESS=DEFLATE make sure TIFF readers interpret the colors correctly and that the file is as small as possible without throwing away any data.

By the way, GDAL is very flexible about the order of commands—it usually doesn’t matter. I try to be consistent just to keep myself sane, but don’t always manage it (either consistency or sanity.) The order of the bands after -separate does matter, however. For most conventional band combinations, make sure longer wavelengths are listed first.

Here’s the output file after color correction:

True-color Landsat 8 image collected on March 25, 2016. Data courtesy NASA/USGS Landsat.

Landsat scenes cover a much wider area than a RapidEye tile. You can see all the way from Vandenberg Air Force Base (where Landsat 8 was launched!) to Oxnard, California. But Landsat has a lower resolution—30 meters per pixel instead of 5 meters per pixel. This is a common tradeoff in satellite imaging: area covered vs. resolution. Landsat 8 and several other satellites (the SPOT series, Digital Globe’s constellation) carry a special band, the panchromatic band, to get extra resolution from their sensors.

Pan sharpening essentially swaps the luminance information in a low resolution full-color image with a high resolution grayscale image. Since our eyes are more sensitive to changes in luminance than changes in color, a pan-sharpened image is almost as good as a full resolution multi-spectral (the fancy term for color) image. Here’s how you apply pan sharpening with gdal_pansharpen.py, available in GDAL 2.1 and later (the writing of this tutorial was in no way affected by my UNIX environment’s insistence that python try to import GDAL 1.11):

gdal_pansharpen.py LC80420362016085LGN00_B8.TIF carrizo-20160325-oli-rgb.tif carrizo-20160325-oli-pan.tif -r bilinear -co COMPRESS=DEFLATE -co PHOTOMETRIC=RGB

Like most GDAL commands, it may be intimidating at first but is fairly straightforward once you get the hang of it. The script requires the input and output files to be in a specific order: panchromatic band, followed by the multispectral image, ending with the output filename.

Pan-sharpening a Landsat image bumps the resolution from 30 meters per pixel (left) to 15 meters per pixel (right). Images based on data courtesy NASA/USGS Landsat.

It’s also possible to generate a pan-sharpened image from multiple separate bands, instead of a pre-combined file, in which case the command would look like:

gdal_pansharpen.py LC80420362016085LGN00_B8.TIF LC80420362016085LGN00_B4.TIF LC80420362016085LGN00_B3.TIF LC80420362016085LGN00_B2.TIF carrizo-20160325-oli-pan.tif -r bilinear -co COMPRESS=DEFLATE -co PHOTOMETRIC=RGB

Longer to type, but ultimately fewer steps.

Unlike many other GDAL scripts, gdal_pandsharpen uses a smoothing resampling algorithm by default (cubic), but I often use -r bilinear instead because it creates images suitable for additional custom sharpening.

Pansharpening works best when the panchrimatic band spans similar wavelengths to the multispectral bands. Some older satellites like Landsat 7 and Ikonos had a lot of infrared in the panchromatic band, so they always looked a little funny. If you’re working with these data try out the somewhat esoteric -b (bands) and -w ( weights) options. Landsat 8 and most high-resolution satellites have panchromatic data purely in the visible spectrum, and look great out of the box.

Now that I have images from similar dates in two different years, how do I compare them? More GDAL magic, of course. That’s what you’re here for, right?

The Landsat scene and RapidEye tile have both different extents (the area covered by each image) and resolutions, and both need to be matched. I’ll start with the extents, using gdaltindex to create a shape filea file format that stores vector information, in this case the corner points of the RapidEye image.

gdaltindex 1155205_2017-03-31_RE3_3A_extent.shp 1155205_2017-03-31_RE3_3A.tif

Once there’s a shapefile that defines the desired shape use gdalwarp with the -cutline option to crop the pan-sharpened Landsat file and -tr 5 5 (target resolution in the units defined by the coordinates system of the file (meters in the Universal Transverse Mercator projection used by both Landsat and RapidEye)) to resize to match RapidEye’s resolution:

gdalwarp -tr 5 5 -cutline 1155205_2017-03-31_RE3_3A_extent.shp -crop_to_cutline carrizo-20160325-oli-pan-8bit.tif carrizo-20160325-oli-pan-crop-5m.tif -co COMPRESS=LZW -co PHOTOMETRIC=RGB

This creates a perfectly matched pair of images.

Although there were some flowers in the spring of 2016, they can’t hold a candle to 2017’s bumper crop. Images based on data courtesy NASA/USGS LAndsat (left) and ©2017 Planet Labs Inc., cc-by-sa 4.0.

Last but not least, here’s a trick to add georeferencing information into a file that doesn’t have any. This is especially useful if you process images in Photoshop (without Avenza’s Geographic Imager, which I generally like) or other photo editors that don’t natively support GeoTIFF. Just remember to keep the original image size. To strip the headers, open one of the GeoTIFFs in Photshop or another image viewer and re-save it. (Or download this color-corrected copy of the March 25, 2016, Landsat scene.)

First, you’ll need to install a helper python script that’s not included in many (most?) default GDAL installations, gdalcopyproj.py. The script is hosted on Github, and should be downloaded into the directory that your command line environment stores programs. In my case: /Users/myusername/miniconda2/bin/. To make it run navigate to that directory and type: chmod +x gdalcopyproj.py.

To restore georeferencing run (make sure to use the right file name if you created your own TIFF):

gdalcopyproj.py carrizo-20160325-oli-rgb.tif carrizo-20160325-oli-rgb-corrected-nogeo.tif

This copies the information from carrizo-20160325-oli-rgb.tif into carrizo-20160325-oli-rgb-corrected-nogeo.tif. Confirm with gdalinfo carrizo-20160325-oli-rgb-corrected-nogeo.tif, which should print a bunch of text listing the coordinate system, etc. that looks similar to what I showed way back in part 1. Maybe rename the file from -nogeo.tif to -geo.tif, just to keep track.

Hopefully this is enough info for you to get started working with satellite imagery. There’s a large (and growing!) collection of data from free and commercial sources, of ever-increasing accessibility and quality.

My next post will show some examples of using GDAL to translate and crop vector data with ogr2ogr, and some data processing techniques. with gdal_calc.