Otsu’s Method for Image Segmentation

Google Earth
Google Earth and Earth Engine
3 min readMar 29, 2017

By Nicholas Clinton, Developer Advocate, Google Earth Engine

In image analysis, we often need an automatic, data-driven way to distinguish two types of relatively homogenous things, like land vs. water, forest vs. grass or foreground vs. background. For single-band images that have a bimodal pixel distribution, a two-class segmentation can be performed by finding a single threshold that separates the two classes.

Otsu’s method is a means of automatically finding an optimal threshold based on the observed distribution of pixel values (Otsu. 1979).

For example, this technique was recently used to build an Earth Engine based tool for assessing the change in surface water at global scale (Donchyts et al. 2016). Here I’ll describe how to implement the Otsu method entirely in Google Earth Engine. To motivate the discussion, consider finding a threshold on the near infrared (NIR) band that will segment the following image into water and land areas:

Landsat true-color composite, Corpus Christi, Texas (Landsat 2017).

The NIR band is a suitable choice for thresholding the image because water is highly absorptive in the NIR range, and land (especially vegetation) is highly reflective of NIR. The following image is a color-IR composite of the the area in which we’re searching for a threshold:

Color-IR composite illustrating the contrast between water and land. In this image, red is used to display the NIR band, so vegetation appears bright red (Landsat 2017).

Sure, you could click around with the inspector and use trial and error until you find a threshold that looks right, but it wouldn’t be optimal. First, let’s define optimal as Otsu did: whatever partition of the data that maximizes inter-class variance (equivalently, minimizes the sum of intra-class variances). Define inter-class variance as BSS/p, where BSS (between-sum-of-squares) is

In our case, there are two classes so p=2. Here we’re using the between-sum-of-squares (BSS) terminology to indicate that this is a general method to describe the variance structure of a dataset. DN is the digital numbers of the NIR band, where D͞Nₖ indicates the mean digital number in class k and D͞N is the mean digital number of the entire dataset. Class k is defined by every DN less than some threshold. We’re looking for that threshold that maximizes the BSS.

We’re not going to search exhaustively. We’re just going to search over the thresholds that are represented by the bins in a histogram. The advantage of that approach is that it only requires a single pass over the data. At each bin of the histogram, define class k as the pixels in that bin and lower. Class k+1 is everything else. Here’s the implementation as a function, which takes the output of an image.reduceRegion(ee.Reducer.histogram()):

The function looks at every possible partition of the input data defined by the bins of the histogram, then returns the mean associated with the bin that maximizes the BSS. This works better when you strategically choose the region for which the histogram is generated. For example, by choosing a region in the image that has water and land in approximately equal proportion, we can get a rudimentary water mask:

(Landsat 2017)

For interested readers, here’s a link to the code that generates the images.

Happy coding!

Graphics by Chris Herwig

--

--