Runs with Arrays
Earth Engine by Example
A common problem people struggle to solve with Earth Engine is computing various kinds of runs, especially through time using an Image Collection. Calculating these kinds of values involves looking at a pixel’s temporal neighbors, and Earth Engine collections don’t readily support that. This example will show you how to solve these kinds of problems using Earth Engine Arrays.
Just in case you’ve never seen or used an Earth Engine Array, it’s a numeric type that can hold multiple values, arranged as a multidimensional (rectangular) array. The arrays can be 1-D, 2-D, or even 10-D, but for this example, we’re going to be working with 1-D arrays, representing data points covering the same spatial location over time.
The primary tools for anything involving runs are the “forward difference”, and array masking. While this sounds fancy, the forward difference of an array is just taking the difference between each array value and its subsequent neighbor. That is accomplished by subtracting the array from itself, but shifted by one position:
Note that the data shift means that the last value has nothing to be differenced against, so we end up with a result that is one value shorter than the input. We typically end up adding a value to one end or the other to make up for this.
The usefulness in a forward difference is in change detection — non-zero values in the result indicate a change and the start of a new run (although maybe just a run of length 1). The forward difference can be used as a mask on other arrays to, for instance, find the starting index of each run by masking an array of sequential numbers. Each position that corresponds to a forward difference of 0 (no-change) gets removed and what’s left is an array of run start positions. Note the 1 added to the front of the Forward Difference array to compensate for the difference in array lengths; the first number is always considered the start of a run.
The actual value of each run can be found by masking the input with the forward difference results. And the length of each run can be found by taking the forward difference of the start positions, (but that requires sticking an extra value on the end to mark the end of the last run, and negating the results because what we really want is the backwards difference).
Here’s the code that implements the forward difference function and computes the run start positions, run length and the value in each run for some sample precipitation data. Using those, it determines the longest run of dry days in some sample precipitation data:
The no-precipitation runs are those that correspond to
runValue == 0, and finding their start and length is just a matter of masking the
runLengths arrays with those values.
To illustrate how this works for images, we’re going to find the maximum run of dry days in 2019 for every pixel in the contiguous US, using the Oregon State PRISM data. This climate reanalysis dataset has a resolution of 2.5 arc minutes (roughly 4km) at a daily cadence. The chart below is a plot of the time-series of precipitation for a single point near Phoenix, Arizona:
Repeating the run detection illustrated above on every pixel in a collection of images is very similar to the previous code, except for a few issues:
- The array functions on images have the prefix
arrayadded to them:
- An image collection can be turned into an array-valued image using
collection.toArray(). However, this generates a 2D array per-pixel (bands in axis 1, time in axis 0), and that can be flattened into a 1D array per pixel using
- There is no
arrayCat()function for images; arrays are combined by calling
image.toArray()on a multiband image, and the bands are concatenated together.
- To get the array indices, a single, maximum-sized list is turned into an array per pixel, then truncated to the proper length on a per-pixel basis (pixels might not all have the same length, due to masking).
Here are the results, visualizing the length of the maximum dry run:
- Working with arrays can get confusing, fast. When developing and debugging, I always add every array image to the map as I make it and inspect a point to make sure I know what values each new image contains.
- Be careful when converting collections to arrays; there’s no way to represent masked values in arrays, so if any band in an image contains a masked value, that entire pixel is just skipped (all bands are discarded for that time point). This means in any two neighboring pixels, a given index might not point to the same date. Either include a date band (see the
createTimeBandfunction) in your image, so it ends up as a column in your array, or make sure there are no masked pixels in your input data (I checked, the PRISM data has no masked pixels in 2019).
- In this example, I’m only looking for the maximum run of dry days, so I didn’t need to convert all the precipitation numbers to an array; I only need a sequence of the day-of-year values of the dry days. This way makes the arrays smaller and I can avoid needing to generate/mask the indexes and the
runValueand can compute the
dryRunStartsdirectly from the array of dates (thereby saving memory). https://code.earthengine.google.com/a91f5620ac3d9153af8d12f66b5dd071
- Array-valued images can be really expensive in terms of memory, and therefore often have a scaling limit. In general, the limit is “2GB per tile,” and tiles are usually 256x256 pixels. That means you will be able to work with arrays that contain more than 4096 64-bit floating-point values per pixel, but some array operators make an entire copy of the data they’re using, so the limit is sometimes only a small fraction of that. If you hit a memory limit, it might help to cast your image data to use smaller data types when you can (using e.g.: short() or byte() where possible).
We have a lot of training videos on Arrays and time-series analysis with Arrays. You can find some nice ones here and here.