Data Labeling Contest - Cloud Native Geospatial Sprint

Hamed Alemohammad
Radiant Earth Insights
4 min readSep 3, 2020

STAC specification is getting closer to the ver 1.0 milestone, and as such the first virtual Cloud Native Geospatial Sprint is being organized next week. An outreach day is planned on Sep 8th with a series of talks and tutorials for everyone. Read more about the sprint in this blog post by our Technology Fellow Chris Holmes. A new addition to the sprint is a data labeling contest!

If you have followed our blogs, we have written many times on the importance of open-access and high-quality labels on satellite imagery for building geospatial machine learning models. A scalable solution to generate labels for a large number of imagery is to run crowdsourcing campaigns and encourage the community to contribute to open-access training data catalogs.

What are we labeling?

While multispectral satellite imagery provide valuable and timely observations globally, the presence of clouds in the imagery makes them unusable for monitoring land surface. Indeed, some regions around the world are covered by clouds almost daily. So, it’s essential to be able to detect clouds in the imagery and mask them before running any analysis.

Average multi-year cloud fraction: dark pixels indicate zero, and white pixels indicate a 1.0 cloud fraction. (source: NASA Earth Observatory)

We have decided to run a data labeling contest for identifying cloud (and background) pixels in Sentinel-2 scenes to enable the development of an accurate cloud detection model from multispectral data. Several scenes from Digital Earth Africa’s (DEA) Sentinel-2 catalog have been selected. DEA’s team has converted all of the Sentinel-2 imagery across Africa to COG and hosted them on AWS (check it out here).

After the completion of the contest, the resulting training dataset will be hosted on Radiant MLHub with a CC BY 4.0 license for public access.

How are we going to label imagery?

Azavea’s GroundWork platform is being used for the contest. Their team has already ingested a set of Sentinel-2 scenes from DEA’s catalog and created several projects that will be shared with participants. Each scene will be divided into 512 x 512 pixel tasks on GroundWork, and participants can choose to label any of them or automatically get assigned to a task.

In each task, you should label cloud and background pixels and ensure that all pixels are assigned to either of the two classes before submitting them. You will receive detailed instructions from GroundWork’s team on how to use the tool and identify cloudy pixels.

Scoring

We have defined a score to rank your contribution in the contest based on a combination of the number of tasks you finish and their complexity. For example, tasks that have no cloudy pixels are much easier to label compared to tasks that have many small patches of altocumulus cloud.

S: Your score

N_tasks: Number of tasks completed (completed is defined as all pixels labeled)

N_polygons: Number of polygons completed overall (polygons of both cloud and background classes will be counted)

f_cloud: fraction of cloud-labeled pixels in a completed task

f_background: fraction of background-labeled pixels in a completed task

For example, if you finish two tasks, one of them with a single cloudy polygon covering 30% of the task, and another one with two cloudy polygons covering 40% of the task, your score will be:

You can check the contest leaderboard to see your score.

Awards

A number of awards will be presented to top contributors of the contest:

  • Top Labeler — $2000 plus an open 50cm SkySat Image, tasked by the winner.
  • 2nd place labeler — $750
  • 3rd place labeler — Jacket or $200
  • Top Labeler from an African Country (who is not in the top 3 prizes) — Jacket or $200
  • Top Woman Labeler (who is not in the top 3 prizes) — Jacket or $200
  • Next 5 top labelers — Hoodie or $60
  • Anyone with a minimum score of 10 on the leaderboard — t-shirt or $20

Read more about all the awards of the Cloud Native Geospatial Sprint here.

How to participate?

Fill out this form, and you will receive an email from GroundWork on Sep 8th at 10am PDT (5pm UTC) notifying you about the projects that are ready to be labeled. Depending on the completion rate of projects, we will add more projects throughout the contest.

You will have until 11:59pm PDT on Sep 15th (6:59am UTC on Sep 16th) to participate and label imagery. After that the leaderboard will be closed and awardees will be selected.

Slack Channel

We have created a new channel on Radiant MLHub’s slack named, #stac-6-labeling-contest, for participants to share their experience with each other. If you are already on our slack workspace, search for the channel and join. If not you can join the workspace using this link, and then join the channel.

Finally, this wouldn’t have been possible without the support of our sponsors. Thanks to Planet, Microsoft, Azavea, and Radiant Earth Foundation for sponsoring this event.

Looking forward to seeing many of you in the contest!

Sample image from GroundWork showing cloud and background labels overlaid on Sentinel-2 scene (credit: Azavea)

--

--

Hamed Alemohammad
Radiant Earth Insights

Associate Professor and Director of Center for Geospatial Analytics at Clark University