Histogram matching is a quick and easy way to “calibrate” one image to match another. In mathematical terms, it’s the process of transforming one image so that the cumulative distribution function (CDF) of values in each band matches the CDF of bands in another image.
To illustrate what this looks like and how it works, I’m going to histogram-match a high-resolution (0.8m/pixel) SkySat image to the Landsat 8 calibrated surface reflectance images taken around the same time. Below is what it looks like with the SkySat image overlaid on top of the Landsat data, before the matching. Each image’s histogram is shown as well:
To make the histograms match, we can interpolate the values from the source image (SkySat) into the range of the target image (Landsat), using a piecewise-linear function that puts the correct ratio of pixels into each bin.
Below is the code to generate the piecewise-linear function using the cumulative count histograms of each image.
This code starts by splitting the (2D Array) histograms into the pixel values (column 0) and pixel counts (column 1), and normalizes the counts by dividing by the total count (the last value).
Next, for each source bin, it finds the index of the first bin in the target histogram where
targetCount ≥ srcCount[i]. To determine that, we compare each value from
sourceCount to the entire array of
targetCounts. This comparison generates an array of 0s where the comparison is false and 1s where the comparison is true. The index of the first non-zero value can be found using the
Array.argmax() function, and using that index, we can determine the
targetValue that each
srcValue should be adjusted to. The output of this function is formatted as a dictionary that’s suitable for passing directly into the
Next, here’s the code for generating the histograms and adjusting the images.
This code runs a
reduceRegion() on each image to generate a cumulative histogram, making sure that only pixels that are in both images are included when computing the histograms (just in case there might be a cloud or something else just outside of the high-res image, that might distort the results). It’s not important to generate that histogram with a really high fidelity, so the
maxPixels argument is set to use less than “4 tiles” of data (256 * 256 * 4) and
bestEffort is turned on, to make the computation run fast. When these arguments are set this way, the
reduceRegion() function will try to figure out how many pixels it would need to process at the given
scale (30 m), and if that’s greater than the
maxPixels value, it computes a lower scale to keep the total number of pixels below
maxPixels. That all means you need to specify a scale, but it doesn’t matter what it is as it’ll be mostly ignored.
This code then generates the lookup tables for each band in the input image, calls the interpolate() function for each, and combines the results into a single image.
The final results look like this. Until you zoom in really far, it’s nearly impossible to tell where the Landsat image ends and the Skysat image begins.
Here’s a close up of a detailed region in each of the 3 images:
The whole script has some additional functions to find the images closest in time, but as you can see, the histogram matching is pretty easy and usually does a great job overall.
- If there’s anything anomalous in your image that’s not in the reference image (or vice versa), like clouds, the CDF can end up skewed, and the histogram matching results might not look that good.
- A little mis-registration between the source and target images is usually ok, since it is using the statistics of the whole region and doesn’t really rely on a pixel-to-pixel correspondence.