Ready, Set, Go! The WiDS 2019 Datathon

The 2nd Annual Women in Data Science (WiDS) Datathon is launching today, January 29, 2019! In advance of the March 4, 2019 Global WiDS Conference, the Global WiDS Team, the West Big Data Innovation Hub, and the WiDS Datathon Committee have been working with Planet and Figure Eight to bring a dataset of high-resolution satellite imagery to participants, building awareness about deforestation and oil palm plantations.

We invite you to build a team, hone your data science skills, and join us in this predictive analytics challenge focused on social impact. Keep reading to learn more about the datathon, the significance of oil palm, and how to get started.

WiDS Datathon 2019 will last until February 27, 2019. Winners will be announced at the WiDS Conference at Stanford University on March 4, 2019.

What’s a datathon?

A datathon is a data-focused hackathon — given a dataset and a limited amount of time, participants are challenged to use their creativity and data science skills to build, test, and explore solutions. Try something new, apply what you know, learn from other participants, and improve your data science skills along the way!

Last year’s first-ever WiDS datathon focused on data from Bill & Melinda Gates Foundation grantee InterMedia, to better understand global poverty and digital financial technology use. Watch this recap video from the WiDS Conference, where the winners of the Datathon were announced — and this video from a Datathon-focused event held in New England last year.

We will host the WiDS Datathon on Kaggle, an online community of data scientists and machine learners. If you haven’t participated in Kaggle competitions before, check them out here — the platform has a ton of complex challenges with real-world impact and starter datasets for learning data science basics.

This year’s dataset and challenge

The WiDS Datathon 2019 challenge is to create a model that can detect oil palm plantations in satellite imagery. Planet and Figure Eight have generously provided an annotated dataset of satellite images recently taken by Planet satellites. The dataset images are 3-meter spatial resolution, and each is labeled with whether an oil palm plantation appears in the image (0 for no plantation, 1 for any presence of a plantation).

The datathon task is to train a model that takes as input a satellite image and outputs a prediction of how likely it is that the image contains an oil palm plantation. Fully labeled training and validation sets are provided for model development; you will then upload your predictions for an unlabelled test set to Kaggle and these predictions will be used to determine the public leaderboard rankings and the winners of the competition.

Who can participate in the datathon

We invite anyone from those new to data science to veterans of the field to participate. For those who have never tried machine learning or worked with satellite data before, we will be releasing a series of guides to help you get started with the algorithms and dataset.

The WiDS Datathon aims to inspire women worldwide to learn more about data science, and to create a supportive environment for women to connect with others in their community who share their interests. Toward these ends, we open the datathon to individuals or teams of up to 4, at least half of each team must be women (people identifying as female). Participants can be students, faculty, government workers, members of NGOs, or industry members.

For more information and answers to frequently asked questions, please see the WiDS Datathon FAQ that we will update throughout the competition.

Getting started

To get started with the dataset and challenge we recommend you:

  • Join the WiDS Datathon community mailing list to make sure you receive news and announcements!
  • Create an account on Kaggle, and enter the WiDS Datathon 2019 Competition.
  • Read background material about satellite imagery (from our partners at Planet Labs); we will be posting guides about image analysis and beginner-friendly machine learning algorithms on our website soon.
  • Consider forming a team with new collaborators! Connect with potential teammates on the Kaggle forums, at an in-person team formation or Datathon-focused event hosted by WiDS Ambassadors, and through social media with #WiDSDatathon.

Lastly, be creative, and have fun! Good luck to all participants — we’re so excited to see what you create.

Background: Oil palm and why it’s important

Deforestation through oil palm plantation growth represents an agricultural trend with large economic and environmental impacts. From shampoo to donuts and ice cream, oil palm is present in many everyday products — but many have never heard of it explicitly!

An example oil palm plantation. Photo by Jenny Farmer/CIFOR.

Oil palm is the tree that produces palm oil, an edible vegetable oil found in products from cosmetics and soap to instant noodles. It is the world’s most-used cooking oil, and researchers have estimated that the average human consumes 17 pounds of palm oil each year. In 2010, palm oil accounted for approximately 10% of exports in some countries, and is connected to employment for millions of people. With growing global demand for edible oils and biofuels, land used to grow oil palm could grow by an estimated 53 million more hectares by 2050, with much of the expansion predicted to happen in Africa and South and Central America.

Because oil palm grows only in tropical environments, the crop’s expansion has led to deforestation, increased carbon emissions, and biodiversity loss, while at the same time providing many valuable jobs. A well-known example of biodiversity impact is the 80% decline in orangutan population in Borneo over the past 75 years due to habitat loss. Much of the area best suited to grow oil palm in West and Central Africa also overlaps with regions of high primate diversity and endangered species habitats.

Orangutans are an endangered species whose habitat overlaps with land suitable for oil palm cultivation. Photo by Teodor Kuduschiev.

With the economic livelihoods of millions and the ecosystems of the tropics at stake, how might we work towards affordable, timely, and scalable ways to address the expansion and management of oil palm throughout the world?

High-resolution satellite imagery is a global, regularly-updated, and accurate source of data. Coupled with computer vision algorithms, it presents a promising opportunity for automated mapping of oil palm plantations, an important step toward understanding global impact.