Random Samples With Buffering

Published in

Google Earth and Earth Engine

5 min readApr 16, 2021

Earth Engine by Example

A common concern when doing land cover classifications is the risk of spatial autocorrelation in sampled data skewing the prediction results or accuracy assessment. One method that can help with this issue is ensuring the training and validation samples are sufficiently spaced using some form of buffering. This example will demonstrate one way to do that.

A brute-force method for generating a random sample with a buffer might be to take a large set of samples and filter those down to a smaller number by discarding close neighbors. However, picking which points to start from can be challenging, and computing the cross-product of distances between points can be expensive, so that approach is unlikely to scale well. The general rule for Earth Engine is “the more work that can be done in image space (using tiles and pixels), the better a solution is likely to scale.” To that end, this example will demonstrate buffering points by generating grid cells of a specified size and sampling one point from each grid cell.

Generating raster-based grid cells is very straightforward; you simply reproject any image into the desired projection+scale using reproject(). This example will eventually rely on neighboring cells having unique integer values, so a good starting point is to reproject a random image generated by ee.Image.random().

50km grid cells in Albers projection, randomly colored.

The next step is to pick a random point within each of the grid cells. That can be accomplished (still in image space) using reduceConnectedComponents()on the cells result plus a second random image, selecting the maximum random value in each grid cell. The reduceConnectedComponents function applies a reducer to the values (the random image) covered by each patch of homogenous values in a label band (the cells). This example uses a grid for the second random image 1/16th the original grid’s size, meaning there are 256 random points generated inside each grid cell. The location where random == maximum within each grid cell gets marked with a 1 value and the rest of the values are masked.

50km grid cells (randomly colored), with 1 randomly chosen point (white) in each cell. On average, points are spaced `50km` apart, but there’s no guarantee on minimum spacing yet.

This configuration doesn’t quite guarantee that each point is distance meters away from the nearest neighbor, only that they are that far apart on average. This is a “loose” idea about point spacing. If “strict” adherence to the buffering distance is needed, then every other row and column of cells in the grid can be masked out using ee.Image.pixelCoordinates() and some math. Removing these cells guarantees each point will be a minimum of distance from its nearest neighbor and 2*distance on average.

Grid cells with even coordinates have been discarded, thus guaranteeing that the random point in each cell is at least the given distance from its nearest neighbor. On average, points have a spacing of distance*2.

The final result of points can be extracted to a FeatureCollection using reduceToVectors. The images below display the extracted points with a buffer of radius distance/2 for visualization purposes. Note, in the 50km version (left, orange) that there are points that nearly touch in the lower left and upper right of the image, but none that overlap.

The final random points displayed with a buffer for visualization, using 50km ‘strict’ spacing (left) and 5km ‘strict’ spacing (right).

With everything built into a callable function, the complete script can be found at https://goo.gle/3tsFpa7 along with a utility for displaying the pixel grid of a projection.

Displaying the pixel grid of a projection.

Caveats

Use clip() before reproject() so individual cells on the coastline don’t get split into separate parts (and become multiple points).
Using reproject can often be problematic when displaying results on a map, since it overrides Earth Engine’s normal scaling behaviors. It will only be an issue in this example if you use a small cellSize and then zoom out really far. There should be no issues displaying (or using) the final FeatureCollection, since all the reprojects are map-independent by then.
I was able to get this to scale to >300,000 points in the Code Editor. To use more points than that, you might need to run it as a table export, or use multiple passes. But split things spatially; otherwise, points might not maintain the desired spacing.
You can add bands to the inputs going into reduceToVectors to sample covariates at the same time (use a first reducer in that case). If you run out of memory, try exporting the points (without covariates) to a table first.
To do stratified sampling, you can simply replace reduceToVectors with stratifiedSample, however, you will need to mask the class band with the points image.
I elected to use an Albers projection because both Mercator and plate carrée have distance distortion as you move away from the origin, so it’s harder to ensure the minimum distance guarantee with fixed-size grid cells in those projections. Note: the projection you use to generate your points doesn’t have to match the projection you use to sample your covariates.
Suppose you already have points and just want to select a subset that meets the buffering criteria. In that case, you can use reduceRegions with a max reducer on the random image, grouping by the cells image. The max reducer will allow you to specify additional inputs (e.g.: covariates or pixel coordinates) to carry along with whatever maximum it finds.
If you’re going to take multiple samples for e.g.: k-fold cross-validation, you should offset the grid each time so you’re not using the exact same sampling grid for each fold. You can do that with something like this:

Random Samples With Buffering

Caveats

See Also

Published in Google Earth and Earth Engine

Written by Noel Gorelick

No responses yet