Data Labeling Contest: Crowdsourcing a scalable solution to generate labels for satellite imagery

A conversation with the First Place winners of the Data Labeling Contest

Radiant Earth
Radiant Earth Insights
8 min readJan 12, 2021

--

In September 2020, we announced the Data Labeling Contest winners. The contest was part of the Cloud Native Geospatial Outreach Day sponsored by Planet, Microsoft, Azavea, and Radiant Earth Foundation. Participants were invited to contribute to open-access training data catalogs by identifying cloudy pixels in Sentinel-2 scenes. Two hundred thirty-one labelers joined the contest, representing a wide range of educational backgrounds, institutions, and geographies. While several awards were given to the top 83 contributions in six categories, in this Q&A, we sat down with Solomon Kica from Uganda and Jhomira Vanessa Loja Zumaeta from Peru, who won the Top Labeler first prize awards. Both winners were selected for the top prize because their scores were incredibly close, a 3.6% difference, and both scores stood out from the rest of the participants.

Solomon Kica, Top Labeler Award Winner

Solomon Kica is the Founder of “Resilience Mappers,” an OSM community that maps informal settlements in Uganda’s cities.

Meet Solomon Kica, one of the two Top Labelers of the Data Labeling Contest. Solomon recently completed his undergraduate studies in Land Surveying and Geomatics at Makerere University in Kampala, Uganda. He is also the coordinator of Resilience Mappers, an OpenStreetMap community he founded in March 2020 while studying to map informal settlements in Uganda’s cities. He is most passionate about GIS and Remote sensing because of their capability to solve several societal challenges. Outside of his studies and work, he loves reading novels and listening to music.

Congratulations on winning the Data Labeling Contest! Tell us about yourself? Where are you from, and what’s your educational background?

I am from Uganda and recently completed my undergraduate degree in Land Surveying and Geomatics at Makerere University in December 2020. But due to COVID19 I am yet to graduate in March 2021. I am also the coordinator of Resilience Mappers, an OpenStreetMap community in Uganda that I founded to provide geospatial data to stakeholders in informal settlements in Uganda’s cities. I lived my early childhood in these settlements seeing the daily challenges that many residents went through such as access to waste management, sanitation, health, education services among others. The government plans in these communities usually do not match the priorities of the people because they are not consulted or engaged with in the planning processes. With the knowledge I obtained at TU Delft’s summer school planning and design with water, I realised that the bottom-up approach of engaging settlement leaders and residents in mapping these communities would ensure that their interests are considered in city plans, hence creating an inclusive and sustainable city. With support from the

Humanitarian OpenStreetMap Team (HOT OSM), we have managed to map the buildings and roads in 15 of these informal settlements covering over 8 square kilometres on OpenStreetMap and data shared to urban authorities.

You and your co-winner’s score stood apart from everyone else. What inspired you to participate? How did you approach the contest, and what do you think set you apart?

Machine learning is a significant field in Geomatics, which I realized when I was a student. I worked with satellite imagery in several classroom projects during my four-year program. However, one of the biggest challenges I always faced was finding a satellite image that is 100% free of clouds. This inspired me to participate in this contest, knowing that I will be contributing valuable data that I could use one day.

At the beginning of the contest, I knew it would be very competitive because of the top prize. I knew that there was also a possibility that I was competing with industry experts, and I, therefore, needed to work much harder than everyone else. Being a person who loves being challenged and motivated by the top prize, including tasking a satellite image of my choice, I worked longer nights. In this contest, my advantage was that schools were closed because of COVID19, which allowed me time to dedicate 100% to labeling. Each day for the duration of the project, I ensured I worked for over 20 hours. I think this, and the wand tool that was available and faster than the draw tool, assured that I surpassed other participants.

What are your thoughts on the Data Labeling Contest? Was participation easy? Did you experience any challenges?

I think participation was easy for me. There was a tutorial on YouTube about data labeling on the GroundWork platform, and watching this ensured I didn’t find any significant challenges navigating the platform.

The major challenge I faced while navigating the platform was the slow internet connection, a general problem in Uganda. Additionally, I felt like the difference between the first and second prizes was significantly different, making everyone compete for the top prize; this required extra hard work, determination, and discipline. The data labeling itself wasn’t very challenging because I had borrowed a leaf from mapping and vectorization processes in GIS. I only had to spend a few hours getting used to the user-friendly Groundwork labeling platform.

Were you familiar with labeling satellite imagery before this contest? What are some things that you learned by participating in the contest?

It was my first time to participate in labeling satellite imagery. However, it wasn’t my first time to hear of data labeling. A few months before, I participated in a machine learning workshop at the Crossroads Emerging Leaders Program organised by the HBS Club of the GCC where I learned the need to label data to feed machines because machine learning models are data-hungry. Therefore the more data you provide them, the better the model performs. One significant thing I learned from the contest is the need to create high-quality label data for machine learning models because each developed model is highly dependent on the quality of this data. In addition, I got motivated enough to start considering doing a Masters in machine learning in Geomatics.

Machine learning techniques are powerful in identifying patterns from satellite imagery globally. What role do you think ML techniques can play in addressing development challenges in Uganda?

Machine learning has revolutionized several industries in the world, and Uganda would be no exception. The role of machine learning is broad, ranging from developing solutions that diagnose diseases to optimizing transportation routes and commuting. When applied in Uganda, machine learning techniques could provide a wide range of solutions to the country’s developmental challenges. For instance, Uganda’s economy is largely backed by agriculture. Machine learning can significantly impact this sector, from models that detect common crop diseases and weeds, quality assessment of farm produce, crop type mapping and identifying crop stress, forecasting pests, disease outbreaks, and alerting farmers of these challenges in advance. Therefore, I would think that if we apply machine learning in Uganda’s agricultural sector, the way farming is carried out locally can be greatly transformed.

Jhomira Vanessa Loja Zumaeta, Top Labeler Award Winner

Jhomira Vanessa Loja Zumaeta is an engineering geography student living in Lima, Peru.

Meet Jhomira Vanessa Loja Zumaeta, one of the two Top Labelers of the Data Labeling Contest. Jhomira lives in Lima, Peru, and is an engineering geography student at the National University of San Marcos. Jhomira is the recipient of the PRONABEC-PERÚ scholarship. She invites you to listen to her most recent podcast, where she talks about her experience participating in the data labeling contest and her studies.

Congratulations on winning the Data Labeling Contest! Tell us about yourself? Where are you from, and what’s your educational background?

I was born in Cocabamba, a small town located in the middle of Peru’s rainforest (Amazonas). My family was economically “poor,” but it was never an impediment to having a happy childhood surrounded by trees, animals, and flowers.

However, as I grew up, I saw several changes in the environment; The forest gradually disappeared, and the first cars started to arrive in my village. Due to this new situation, my family decided to move to Lima, Peru’s capital. It also inspired me to study geographical engineering at San Marcos University. Currently, I am in the VI semester with a permanent scholarship, which allows me to focus on my studies without worrying about the financial part.

I am presently interested in geomatics and spatial data analysis and promoting the use of freely accessible and available information through open platforms.

You and your co-winner’s score stood apart from everyone else. What inspired you to participate? How did you approach the contest, and what do you think set you apart?

I was struck by participating in a world-class competition and constructing a sizeable deep learning dataset. I started the contest with great enthusiasm; I labeled throughout the week, and on some days opted to forgo sleep. I was constantly aware of the score table and organized myself to attend classes at the university while continuing to participate in the contest. There were times when I felt that I had no energy anymore, but I continued because it was something I started, and my goal was to finish.

What are your thoughts on the Data Labeling Contest? Was participation easy? Did you experience any challenges?

I found the data labeling contest very interesting. At first, I thought it was going to be an easy task. But it was quite demanding and required a lot of photo-interpretation criteria. For instance, it is straightforward to recognize clouds of type cumulus /cumulonimbus because the shape and density are similar. At the same time, cirrus and haze are more difficult because of their transparency and irregular shape.

Were you familiar with labeling satellite imagery before this contest? What are some things that you learned by participating in the competition?

Satellite image labeling was a new and enriching experience. I am more than satisfied with gaining a new perspective on the label and its applications. It has also helped me understand that high-quality hand-crafted labeling is crucial to ML or DL models’ success.

From this contest, I started to investigate more about the importance and applications of the training dataset. After the exposure to remote sensing and machine learning, I can say that I have many expectations for this field.

Machine learning techniques are powerful in identifying patterns from satellite imagery globally. What role do you think ML techniques can play in addressing development challenges in Peru?

In my country, which suffers from multiple environmental and social problems, these techniques can help solve Peruvian rainforest deforestation, forest fires, illegal mining, and landslide detection.

I am very excited about the great advances in remote sensing and the construction of cloud segmentation datasets. I firmly believe that this is the right path to creating cloud-free composites in the Peruvian Amazon.

Finally, I want to thank Radiant Earth, Azavea, Planet, and Microsoft for organizing the data labeling contest. By releasing this new dataset, you will help improve our monitoring systems and help in the conservation of our environment, especially in the Peruvian rainforest.

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.