Data Challenge Winners: Q&A with MG Ferreira and Tien-Dung LE
A conversation with the First Place winners of the Radiant Earth Spot the Crop XL Data Challenge.
We recently announced the Radiant Earth Spot the Crop Data Challenge winners to predict crop types in Western Cape, South Africa using satellite image time-series. The competition was organized in two parallel tracks: In track 1, participants used time-series of Sentinel-2 multispectral data as input to their model; In track 2, both Sentinel-2 and Sentinel-1 (radar) data were required as input.
Hosted on Zindi, we organized the competition in partnership with the Western Cape Department of Agriculture in South Africa, and with support from the convening sponsor GIZ FAIR Forward program, the platinum sponsor Computer Vision for Global Challenges (CV4GC), and the gold sponsor, Descartes Labs.
Eight hundred thirty-one participants competed to build machine learning models that identify crop type classes across both tracks. Radiant Earth generated the training data based on ground reference data collected and provided by the Western Cape Department of Agriculture.
In this Q&A, we sat down with MG Ferreira from South Africa and Tien-Dung LE from Belgium to talk about their journey to becoming data scientists and their approach to tackling the problem. MG and Tien-Dung won the Spot the Crop XL Data Challenge winners that used Sentinel-1 radar and Sentinel-2 multispectral data as input to the model.
For our conversation with the Spot the Crop Data Challenge winner that used only Sentinel-2 as input, click here.
Congratulations on winning the Radiant Earth Spot the Crop XL Data Competition. What inspired you to get involved in this field? How did you become interested in machine learning? Tell us about your machine learning journey.
MG: I trained as an econometrician but loved its technical aspects more. When I entered the labour market, derivative instruments and risk management became huge, allowing me to pursue a more technical career. I subsequently changed my course to mathematics. Along this path, I started using neural networks to construct automated trading models. This really was in desperation, as all the other techniques I tried failed to trade profitably. I am still fascinated by the field and started competing in order to stay abreast of the latest developments.
Where did you learn about the Spot the Crop XL Data Competition, and what made you decide to participate?
I learned about Zindi at Nvidia’s GTC conference in April 2021 and immediately competed in challenges. There, I met Tien-Dung, who I believe to be one of the best in the world. After competing against him, I jokingly told him to select one of the two Radiant Earth competitions so that I could stand a chance in the other one. He then invited me to form a team, and I grabbed this opportunity with both hands.
Your winning algorithm outperformed 960 solutions submitted by 322 participants from 57 countries. How did you approach the problem, and what do you think set you apart?
I think Tien-Dung’s experience helped us gain an edge. From my side, I tested numerous ideas, and funny enough, the simpler ideas often worked better, so we were very careful not to overfit. Something else that set us apart is that we started working on the solution immediately and finished our model very early in the competition. So we were done by the time the others started competing seriously. In the end, with such a playing field, there is no winning recipe, and I am grateful that we managed to get first prize.
Were you familiar with using machine learning on satellite imagery before this competition? How does this differ from common problems in computer vision?
I have competed in similar competitions, and image quality is often a problem. Another difference here is the multiple channels in an image. This makes it more of a challenge as you first have to tweak any boilerplate code you find on the Internet to deal with the multiple channels.
What unexpected insights into the data have you discovered?
A satellite photo is taken every few days, and as is, it is simply too granular to model. So you need to find the best way to aggregate. Is it, for example, better to use the average of all images in a week or a month? Is it better to use the average or the median of each channel? These questions present quite a challenge. Without going into the details, it was surprising that the simpler approaches often worked better.
Any challenges you would like to share?
When you work with satellite images, you need to be able to handle a lot of data. The data sets are huge and you must be comfortable managing them and writing efficient code to traverse such a data set quickly.
Machine learning is a fast-growing field. How do you stay up-to-date with the latest technological developments?
MG: The Internet allows you to access material at the click of a button that in earlier days would require weeks of library time to dig out. To prevent this from being just academic, I compete actively on platforms such as Zindi and Kaggle to gain practical implementation experience.
Any words of advice for beginner data scientists who would like to participate in data competitions?
MG: I would certainly encourage you to compete. The benefit is that you gain invaluable practical experience and the competitive environment forces you to use the best techniques. At the same time, the score does not mean much and, while you should play to win, realise that the benefit is the journey and not the destination. However, it would help if you were careful as it is easy to be exploited. Read the rules carefully before you sign up. Whenever the rules change midway through a competition, reevaluate your involvement and ensure there is a legitimate reason for the change.