From Kaggle competition to start-up and tracking 2 million km² of forest

Indra den Bakker
Overstory
Published in
6 min readMar 2, 2018
Selective logging in the Amazon captured by Planet, source: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data

Back in the days, during my studies I was introduced to Kaggle. For the course ‘Data Mining Techniques’ at VU University Amsterdam we had to compete in the competition Personalize Expedia Hotel Searches — ICDM 2013. Me and my fellow team members did okay but we were far from the top scoring teams. However, I immediately wanted to learn more about the field of machine learning and it’s applications.

In the years to come, I competed in several competitions. I never managed to dedicate as much time as I wanted but every single competition was a great learning experience. In 2017, one of the goals that I had set for myself was to pick a Kaggle competition and fully focus on it to get my first golden medal. The competition I picked was Planet: Understanding the Amazon from Space. I already had some experience with satellite imagery and the use case sounded interesting. This turned out to be a great pick with many consequences in the months afterwards.

Analysing the Amazon with Planet
I started working on the competition roughly one month before the deadline and I immediately took a deep dive by experimenting with pre-trained deep learning models. This soon turned out to be the way to go and I managed to get some decent scores. This, of course, tasted like more. I kept track of all the messages in the discussion board and I learned tons from my fellow Kagglers.

Kaggle competition Planet: Understanding the Amazon from Space, source: https://www.kaggle.com/c/planet-understanding-the-amazon-from-space

For those not familiar with the Planet competition, the goal of the competition was to track the human footprint in the Amazon rainforest. The satellite imagery was provided by Planet and had 3m resolution — this means that every pixel resembles 3 meters. This allows to detect small changes like selective logging. The competition is set-up as a multi-label classification problem and the metric used was the F2-score. The labels included selective logging, agriculture, primary rainforest, clouds, roads, and more.

One thing that is great about Kaggle is that everyone — also the high ranking competitors — is always open to share their insights and ideas. A great example is Heng CherKeng, who was one of the biggest drivers of this competition and shared most of his code during the competition.

Golden medal and Kaggle Master tier
With one week to go I was moving around in top 15 when I was invited to join a team. I knew that joining a team and ensembling the models could give a boost so I decided to accept. After an intense couple of weeks with sleepless nights the reward was an amazing 5th position and a golden medal resulting in a Kaggle Master tier. I know that I couldn’t have achieved this without my teammates Eureka and weiwei. Even more, the resulting F2 score > 93% can have a great impact on how to process large amounts of satellite data to classify small scale activities in the Amazon rainforest.

Moving forward
Already during the competition I started to think about what else could be done with this information. Of course, Planet had to set-up a fair playing field for this competition, but I was quite certain that with additional information and techniques we could push the boundaries of the results even further.

I connected with Anniek — now co-founder of 20tree.ai — and we started discussing what else we could do with these types of satellite imagery. In the original competition we had to classify fixed chips of images with 4-bands. This is a static moment in time, but the beauty of satellite imagery is that you can add the time dimension. Especially with the frequent revisits of Planet’s satellite constellation you can inspect pieces of land over time. A logging road starts as a small dot but increases in length over time. Moreover, ideally you don’t only want to classify images itself, but you want to locate where deforestation occurs on pixel level. And maybe we could even find the underlaying patterns and predict areas to monitor more closely.

Sometimes people argue that the top algorithms in Kaggle competitions are too complex to use in production and often a simpeler model would work just as good in real life. In our case, the large ensemble of models used in the Kaggle competition is just a small part of an even larger ensemble of models and additional techniques used to retrieve insights.

2nd prize in the Airbus GEO Challenge 2017, source: https://www.agorize.com/en/challenges/airbus-challenge/pages/final

Airbus GEO Challenge
We decided to do some research in the field and to learn more about satellite imagery and forestry. The areas to protect from deforestation are huge, so the combination of satellite imagery and deep learning to detect early patterns seemed like a sweet spot to us. Moreover, the availability of high-resolution satellite imagery (up to 30cm), the availability of radar data, and the increasing revisits by different providers could provide timely insights that stakeholders can use to act upon.

There was indeed a growing number of companies working to retrieve valuable information from satellite imagery, but most of them seemed to focus on other verticals. With our ideas in mind we decided to compete in the Airbus GEO Challenge. To our surprise, after a couple of rounds, we were invited for the finals in Toulouse, France and we won the 2nd place including a voucher to buy satellite imagery.

DigitalGlobe Sustainability Challenge
In the months after the challenge, we connected with different stakeholders and we received a lot of positive feedback. We decided to register our new start-up 20tree.ai. At 20tree.ai we don’t only focus on deforestation, but we also provide our customers with forest insights to make forest management more sustainable and efficient. We kicked of with a couple of Proof of Concepts and signed our first customers in the months after.

In the beginning of 2018 we submitted our project proposal for the DigitalGlobe Sustainability Challenge and we were selected as one of the 5 winners. This gives us access to DigitalGlobe’s high-resolution imagery that we use to develop deep learning models to detect deforestation, for instance illegal logging and expansion of agriculture. For this project, we are collaborating with partners like WWF — one of the biggest drivers behind protecting the Cerrado — to provide actionable forest insights, trends and predictions on one of the most threatened regions of Brazil: the Cerrado. The total area we are tracking is more than 2,000,000 km². With this project we hope to contribute to the Cerrado Manifesto, signed by 61 of the world’s largest food companies. With insights from the sky there are no excuses anymore in traceability and transparency.

Becoming the standard in planet intelligence
We strongly believe this is just the start of an amazing journey that will follow. There is much more in the field of planet intelligence that we want to explore, from soil and water intelligence to environmental impact. This all contributes to a new standard in data-driven planet intelligence.

On the technical side, we are also experimenting with novel ideas. From leveraging GANs and custom Super-Resolution models to using reinforcement learning to train an agent to determine which areas to monitor, why, and with which resolution.

We are often asked why we decided to move into this field and especially the application of forestry. We always proudly react that it all started with an inspiring Kaggle competition.

Kaggle profile: https://www.kaggle.com/indradenbakker

--

--

Indra den Bakker
Overstory

CEO & Co-founder @OverstoryAI. Author of Python Deep Learning Cookbook and student mentor & project reviewer @ Udacity.