Saving the Rainforest from Space

Using Satellite Imagery and Neural Networks to Identify the Leading Indicators of Deforestation

Drew Hibbard

Published in

Analytics Vidhya

5 min readDec 9, 2020

Check out the full code on Github

I think we all know the impacts that deforestation has, not only on the incredible biodiversity contained within rainforests, but on humanity. You can read more about this issue at the WWF or Wikipedia. This post will focus on the impact that machine learning can have and is having on this issue. First I will use high-level imagery to map forest loss over time, and then show how I trained a convolutional neural network to identify the leading indicators of deforestation. That way we can alert authorities of the precise location for them to take action.

Mapping Forest Loss and Gain

Google Earth Engine has tools that make it pretty easy to map changes in the earth over time, such as forest in this case. Using the UMD Hansen imagery I was able to build the following 3-layer map that shows forest loss and gain between 2000 and 2015. The first layer is forest in green, followed by red dots indicating forest loss and blue dots indicating forest gain during that 15-year period. Note the plethora of red dots and lack of blue dots over the Amazon.

Green: forest in 2000. Red: forest loss by 2015. Blue: forest gain by 2015

The Kaggle Challenge

The best source of satellite imagery for this project turned out to be from a Kaggle challenge sponsored by a company called Planet, called Planet: Understanding the Amazon from Space. So I decided to “participate” 3 years after the challenge concluded.

While most satellite imagery is somewhere between 10m and 100m per pixel, Planet’s technology has that down to 3m, which allows us to get a detailed understanding of the area. They provided over 40,000 images of the Amazon rainforest that participants were to categorize into 17 non-exclusive classes. Below is the distribution of those tags. Each image had exactly one “weather” tag, and at least one “land use” tag.

Note the rarity of classes such as “slash and burn” and “conventional mine”. Those were much harder for the model to identify. Now here are examples of some of the tags from the official challenge page. I focused on non-rare tags that are leading indicators of deforestation, such as roads, agriculture, habitation, and cultivation.

When you see a road winding through the pristine forest, a lot of times that is followed by deforestation of the surrounding area. Agriculture, habitation, and cultivation are useful predictors by keeping an eye on the boundaries between those areas and the forest, with the goal being to keep the boundary in place.

Building the Convolutional Neural Network

Naturally for an image classification task I turned to convolutional neural networks. After some experimentation I realized that to get a truly accurate model, I would need to train a full model myself. So that’s what I did.

I took the VGG16 model available in Keras and allowed it to re-train every single layer, not just the top layers. While the pre-trained version is optimized for a generic 1000-class output, this allowed me to optimize for a more specific 17-class output to match the 17 tags.

The problem also required a sigmoid activation function on the output layer, and binary cross-entropy as the loss function. While typically with a multi-class problem we would use Softmax activation and categorical cross-entropy, it made more sense to treat this problem as 17 separate binary classifications due to the non-exclusive nature of the classes.

The model performed quite well on the official challenge metric, which was the mean F-Beta score across all 17 classes, with beta = 2. It scored over 92%, which is within 1% of the challenge winner. With beta = 2, we are focused on minimizing false negatives at the expense of false positives. Which makes sense because there isn’t too much downside to over-predicting the positive class. Humans would of course review any photo before sending a team to investigate the area.

Again, I was particularly focused on the classes that would uncover deforestation, such as roads, agriculture, habitation, and cultivation. It fared well here, and you can see the confusion matrices below.

Testing on New Imagery

But of course the real test would be to test this on new imagery. Unfortunately I don’t have the resources to apply the model to large areas, but technology is certainly available to do that. I simply used Planet’s explorer application and zoomed in on a few areas where we saw red dots earlier, such as the state of Rondonia in Brazil, and took a few screenshots.

I made sure to get shots of the exact same pixel dimensions and resolution as the training set, which was 256x256 pixels at 3m/pixel. It continued to perform well on the most important categories, such as roads and agriculture, and was able to distinguish roads from rivers.

Applications of machine learning such as this can lead to further success stories, such as Costa Rica, which has regained large areas of rainforest in the last 20 years by paying landowners to protect the forest. When we rerun the map layers in 2040, hopefully there will be more blue than red!