Planet’s constellation of earth-observation satellites capture nearly the entire world’s landmass every day at a resolution of 3 meters per pixel, which is an unprecedented combination of spatial resolution, global coverage, and revisit rate. This firehose of imagery enables new analytical approaches to tackle many of the world’s problems, including several of the Sustainable Development Goals. In this post, we focus on deforestation and show how spatio-temporal analytics can be applied to our imagery to produce sub-weekly updates of forest loss.
Illegal deforestation remains a pressing problem across the world. It is especially acute in tropical forests, which are usually converted into areas for agricultural and livestock use. Among the many negative implications for the environment, one that stands out is climate change. Trees act as important carbon sinks by capturing CO2 from the atmosphere, and when forests disappear, a lot of that carbon ends up back in the air. Thus, deforestation is effectively a source of greenhouse emissions and represents the second largest net contributor after the transportation sector. Here’s a great report from the UN on the state of our world’s forests.
Monitoring tropical forests to quantify the loss of trees and detect illegal logging has become an important task for many institutions and companies (such as Global Forest Watch), and the rise of Earth-observation (EO) satellite missions has proven to be of great help. EO missions with global coverage are exceptional because they offer a standardized way of sensing and quantifying forests across the globe. Landsat and Sentinel are two such missions which offer phenomenal multi-spectral range and have been used in the recent past to monitor forests. Today, thanks to the higher revisit rate and resolution of Planet’s data, we can push the limits of spatio-temporal analytics of land features even further and detect change in forests within a week, rather than months.
Detecting change based on a stream or stack of input imagery might seem like a good problem to solve with a recurrent convolutional neural network. However, we opted for a simpler method, splitting our workflow into 2 parts: semantic segmentation of imagery and subsequent pixel-wise time series analysis. The main reason to avoid a direct change classifier is the cost of creating labeled training data. In order to train a change classifier on image stacks, a large number of manually curated change annotations are required. This involves humans sifting through space and time trying to find and delineate change events, which is time-consuming and expensive. On the other hand, creating segmented label maps of imagery is an easier process for which there are standard tools available.
After image segmentation comes time series analysis. In this step, we avoid the need for additional labeled data, since the problem is not treated as a supervised machine learning (ML) task and instead uses simple, statistical inference to determine the likelihood of change. It’s preferable to follow this kind of divide-and-conquer approach, as it’s a common ML engineering good practice. Splitting the problem into parts that can be easily solved independently enables better interpretability, debugging, tool reusability and pipeline maintenance.
Part 1: Semantic segmentation
The goal of semantic segmentation is to produce a model that takes a single image as input and returns another array of the same shape and resolution that indicates the classification of each pixel among a set of predefined land types. We used a simple ontology of forest, ground, water, cloud and cloud shadow, and created a labeled training set of images in a region of the Amazon rainforest.
Our imagery of choice was PlanetScope scenes, individual snapshots taken by each Dove satellite. A second option, PlanetScope basemaps, are a composition of the sharpest and most cloud-free images of a given location over a month. Basemaps are beneficial in ML because they represent refined imagery, are nearly cloud-free, and have zero image gaps over the region of interest. They prove to be a great choice for most analyses, including forest mapping as seen in our NextGenMap project. However, there are problems that need to optimize for the lowest detection latency, such as alerts of deforestation, illegal activities, or disaster events. In these cases, it’s often more suitable to use individual PlanetScope scenes, leveraging their daily cadence at the expense of higher operational complexity. This is precisely the capability we are demonstrating in this post.
We used our labeled data set to train a multi-class pixel classifier based on a fully convolutional neural network. Our architecture choice was a generalized variant of Unet, which is a relatively simple and effective encoder/decoder-type model. The direct output of the model is a discrete probability distribution for each pixel over the set of classes, and these probability maps are passed by our pipeline to the time series model.
In order to assess the predictive accuracy of segmentation alone, the most likely class is chosen for each pixel and then compared to the ground truth. When computed on a hold-out dataset, the model performance was fairly good and correctly predicted large, significant patches of pixels.
Part 2: Time series analysis
In our next step, we modeled the time series of class probabilities to infer changes in the forest state. It’s necessary to choose a standard grid for images to be projected onto from the start. That way, any given geographical location corresponds to the same pixel grid coordinate across different frames. We borrowed the system used in Planet’s basemaps, which divides the globe’s longitude range into 2048 tiles in the Web Mercator projection.
Our pipeline was set up so that jobs would be mapped by tiles in the cloud compute cluster, and all processing for a given tile was handled by an independent worker. Each worker would fetch the input imagery (typically the latest 40–60 scenes that intersect the tile), run the segmentation model, project the segmented scenes onto the tile geometry, build a spatio-temporal data stack and perform the time series inference.
Since the scenes are rectangular and have variable orientations, the resulting frame contains large portions of no data when projected onto a square tile. However, this does not cause a problem since missing pixels can be simply masked out in the analysis.
Given the need to generate inference at a pixel level and the large volume of data processing required, we developed a simple time series model that could be built on top of NumPy array operations to optimize speed and memory. For this we used the per-pixel forest class probability as the main time series signal and masked all points with null data or significant cloud probability. A possible improvement could come from modeling the covariance between class scores, but we left that for future steps.
When it comes to modeling the phenomenon of deforestation, there are some approximations and simplifying assumptions that can be made. For example, we can generally expect that a given forest pixel will experience up to one transition. It would be rare for a location to be forested and deforested multiple times. Taking this as prior, it allows us to model high-frequency variability as stochastic noise drawn from a Gaussian or similar distribution. We tried a few different approaches of low computational complexity, and found independence tests between a 2-window split, e.g. last 3–4 observations vs previous ones, to be reasonably effective in producing an informative change score. In the end, our algorithm was able to confidently find change with a handful of observations after the actual event happened. Given the daily imaging revisit rate (it’s an average rate, in some days there are multiple captures, in some others there are none), this translates to sub-weekly detection latency.
This pipeline is able to update a forest state map with every new observation (including partially cloudy images) and detect change within days of happening. The effectiveness of this process is explained by two aspects of the source imagery: (1) 3 meter resolution captures the characteristic texture of forests and improves segmentation accuracy, and (2) the high revisit rate boosts the statistical confidence in temporal analysis. More data is simply better!