# Histogram Matching

## Earth Engine by Example

4 min readApr 15, 2021

--

Histogram matching is a quick and easy way to “calibrate” one image to match another. In mathematical terms, it’s the process of transforming one image so that the cumulative distribution function (CDF) of values in each band matches the CDF of bands in another image.

To illustrate what this looks like and how it works, I’m going to histogram-match a high-resolution (0.8m/pixel) SkySat image to the Landsat 8 calibrated surface reflectance images taken around the same time. Below is what it looks like with the SkySat image overlaid on top of the Landsat data, before the matching. Each image’s histogram is shown as well:

To make the histograms match, we can interpolate the values from the source image (SkySat) into the range of the target image (Landsat), using a piecewise-linear function that puts the correct ratio of pixels into each bin.

Below is the code to generate the piecewise-linear function using the cumulative count histograms of each image.

This code starts by splitting the (2D Array) histograms into the pixel values (column 0) and pixel counts (column 1), and normalizes the counts by dividing by the total count (the last value).

Next, for each source bin, it finds the index of the first bin in the target histogram where `targetCount ≥ srcCount[i]`. To determine that, we compare each value from `sourceCount` to the entire array of `targetCounts`. This comparison generates an array of 0s where the comparison is false and 1s where the comparison is true. The index of the first non-zero value can be found using the `Array.argmax()` function, and using that index, we can determine the `targetValue` that each `srcValue` should be adjusted to. The output of this function is formatted as a dictionary that’s suitable for passing directly into the `Image.interpolate()` function.

Next, here’s the code for generating the histograms and adjusting the images.

This code runs a `reduceRegion()` on each image to generate a cumulative histogram, making sure that only pixels that are in both images are included when computing the histograms (just in case there might be a cloud or something else just outside of the high-res image, that might distort the results). It’s not important to generate that histogram with a really high fidelity, so the `maxPixels` argument is set to use less than “4 tiles” of data (256 * 256 * 4) and `bestEffort` is turned on, to make the computation run fast. When these arguments are set this way, the `reduceRegion()` function will try to figure out how many pixels it would need to process at the given `scale` (30 m), and if that’s greater than the `maxPixels` value, it computes a lower scale to keep the total number of pixels below `maxPixels`. That all means you need to specify a scale, but it doesn’t matter what it is as it’ll be mostly ignored.

This code then generates the lookup tables for each band in the input image, calls the interpolate() function for each, and combines the results into a single image.

The final results look like this. Until you zoom in really far, it’s nearly impossible to tell where the Landsat image ends and the Skysat image begins.

Here’s a close up of a detailed region in each of the 3 images:

The whole script has some additional functions to find the images closest in time, but as you can see, the histogram matching is pretty easy and usually does a great job overall.