Runs with Arrays

Noel Gorelick
Google Earth and Earth Engine
5 min readApr 15, 2021

Earth Engine by Example

A common problem people struggle to solve with Earth Engine is computing various kinds of runs, especially through time using an Image Collection. Calculating these kinds of values involves looking at a pixel’s temporal neighbors, and Earth Engine collections don’t readily support that. This example will show you how to solve these kinds of problems using Earth Engine Arrays.

Just in case you’ve never seen or used an Earth Engine Array, it’s a numeric type that can hold multiple values, arranged as a multidimensional (rectangular) array. The arrays can be 1-D, 2-D, or even 10-D, but for this example, we’re going to be working with 1-D arrays, representing data points covering the same spatial location over time.

The primary tools for anything involving runs are the “forward difference”, and array masking. While this sounds fancy, the forward difference of an array is just taking the difference between each array value and its subsequent neighbor. That is accomplished by subtracting the array from itself, but shifted by one position:

Computing the forward difference of an array of daily precipitation values.

Note that the data shift means that the last value has nothing to be differenced against, so we end up with a result that is one value shorter than the input. We typically end up adding a value to one end or the other to make up for this.

The usefulness in a forward difference is in change detection — non-zero values in the result indicate a change and the start of a new run (although maybe just a run of length 1). The forward difference can be used as a mask on other arrays to, for instance, find the starting index of each run by masking an array of sequential numbers. Each position that corresponds to a forward difference of 0 (no-change) gets removed and what’s left is an array of run start positions. Note the 1 added to the front of the Forward Difference array to compensate for the difference in array lengths; the first number is always considered the start of a run.

Computing the start of each run by masking an array of sequential numbers with the forward difference.

The actual value of each run can be found by masking the input with the forward difference results. And the length of each run can be found by taking the forward difference of the start positions, (but that requires sticking an extra value on the end to mark the end of the last run, and negating the results because what we really want is the backwards difference).

Here’s the code that implements the forward difference function and computes the run start positions, run length and the value in each run for some sample precipitation data. Using those, it determines the longest run of dry days in some sample precipitation data:

The no-precipitation runs are those that correspond to runValue == 0, and finding their start and length is just a matter of masking the runStarts and runLengths arrays with those values.

https://code.earthengine.google.com/e0baa9361109ca109124dc3360366b00

To illustrate how this works for images, we’re going to find the maximum run of dry days in 2019 for every pixel in the contiguous US, using the Oregon State PRISM data. This climate reanalysis dataset has a resolution of 2.5 arc minutes (roughly 4km) at a daily cadence. The chart below is a plot of the time-series of precipitation for a single point near Phoenix, Arizona:

Daily precipitation over Phoenix Arizona from the PRISM gridded climate dataset.

Repeating the run detection illustrated above on every pixel in a collection of images is very similar to the previous code, except for a few issues:

  1. The array functions on images have the prefix array added to them: image.arraySlice(), image.arrayLengths(), image.arrayMask().
  2. An image collection can be turned into an array-valued image using collection.toArray(). However, this generates a 2D array per-pixel (bands in axis 1, time in axis 0), and that can be flattened into a 1D array per pixel usingarrayProject() or arrayReshape().
  3. There is no cat() or arrayCat() function for images; arrays are combined by calling image.toArray()on a multiband image, and the bands are concatenated together.
  4. To get the array indices, a single, maximum-sized list is turned into an array per pixel, then truncated to the proper length on a per-pixel basis (pixels might not all have the same length, due to masking).

Here are the results, visualizing the length of the maximum dry run:

Visualization of the maximum run of days with no precipitation, 2019. Blue = 0 days, Red ≥ 80.

https://code.earthengine.google.com/20e69e04aec4378dc0e974619ded156f

Caveats

  • Working with arrays can get confusing, fast. When developing and debugging, I always add every array image to the map as I make it and inspect a point to make sure I know what values each new image contains.
  • Be careful when converting collections to arrays; there’s no way to represent masked values in arrays, so if any band in an image contains a masked value, that entire pixel is just skipped (all bands are discarded for that time point). This means in any two neighboring pixels, a given index might not point to the same date. Either include a date band (see the createTimeBand function) in your image, so it ends up as a column in your array, or make sure there are no masked pixels in your input data (I checked, the PRISM data has no masked pixels in 2019).
  • In this example, I’m only looking for the maximum run of dry days, so I didn’t need to convert all the precipitation numbers to an array; I only need a sequence of the day-of-year values of the dry days. This way makes the arrays smaller and I can avoid needing to generate/mask the indexes and the runValueand can compute the dryRunStartsdirectly from the array of dates (thereby saving memory). https://code.earthengine.google.com/a91f5620ac3d9153af8d12f66b5dd071
  • Array-valued images can be really expensive in terms of memory, and therefore often have a scaling limit. In general, the limit is “2GB per tile,” and tiles are usually 256x256 pixels. That means you will be able to work with arrays that contain more than 4096 64-bit floating-point values per pixel, but some array operators make an entire copy of the data they’re using, so the limit is sometimes only a small fraction of that. If you hit a memory limit, it might help to cast your image data to use smaller data types when you can (using e.g.: short() or byte() where possible).

See also

We have a lot of training videos on Arrays and time-series analysis with Arrays. You can find some nice ones here and here.

--

--

Noel Gorelick
Google Earth and Earth Engine

I’m a scientist and engineer working at the intersection of technology and nature. I helped send spacecraft to Mars, and co-founded Google Earth Engine.