Igor Ivanov: Harnessing Machine Learning Skills to Reduce Damages from Tropical Storms
A conversation with the First Place winner of the Radiant Earth Tropical Cyclone Wind Estimation Data Competition
We recently announced the Radiant Earth Tropical Cyclone Wind Estimation Data Competition winners, a contest designed to build a machine learning (ML) model to improve NASA IMPACT’s Deep Learning-based Hurricane Intensity Estimator. Seven hundred thirty-three participants leveraged NOAA’s Geostationary Operational Environmental Satellites (GOES) imagery to estimate the wind speeds of storms at different points in time using satellite images captured throughout a storm’s life cycle.
Hosted on DrivenData’s online platform, we organized the competition in partnership with NASA’s Earth Science Data Systems Program, the convening sponsor. Gold Sponsors Development Seed and Amazon Web Services, and Silver Sponsors Azavea and Element 84 backed the competition prizes. Microsoft AI for Earth program offered Azure cloud credits to each winner, and IEEE GRSS Earth Science Informatics Technical Committee provided technical support.
In this Q&A, we sat down with Igor Ivanov from Ukraine, winner of the first place Development Seed Award, to talk about his journey to become a data scientist and winning the contest.
Congratulations on winning the Tropical Cyclone Wind Estimation Data Competition! Tell us about your machine learning journey. How did you become interested in machine learning? What inspired you to get involved in this field?
Thank you very much! I’m happy that I could participate and contribute to the solution of such a challenging and important problem.
My journey into machine learning started with programming. In 2007, I graduated from NMetAU (Dnipro, Ukraine) with an MSc in Automation and Electromechanics. While studying, I used C++ and MATLAB for my projects. After graduation, I started my career as a Python developer. I met my first neural network just as a coding exercise, and at the time, I did not realize all its power and charm. Nevertheless, I was extremely impressed by the concepts of machine learning in general. So I studied different algorithms and moved to full-time ML engineering around 2012. I worked primarily with tabular data. Around 2014, I found the Kaggle competition platform and participated in many computer vision contests. Since that time, deep learning has become my main interest and specialization.
The main source of my inspiration is the ability of machine learning to solve a wide range of problems. This ability gives researchers many opportunities to help people and our planet.
Where did you learn about the Tropical Cyclone Wind Estimation Data Competition, and what made you decide to participate?
I’m a member of the DrivenData community, so I was notified about the upcoming competition. That message immediately caught my attention because it was a computer vision task (which I love to solve). More to this, it was an interesting problem related to weather forecasting. I downloaded the dataset, and after a first look, I decided that I would not only participate but will spend a substantial amount of time on this task. The dataset was just great, featuring high-quality high-resolution images. Also, the large size of the dataset ensured a good correlation between local cross-validation and public leaderboard scores. Organizers did a really great job collecting and preparing this data.
Another important motivation was that during the competition, I received support from the TFRC program, offering Tensor Processing Unit (TPU) resources to researchers. This was of great help because deep learning is very demanding in terms of computation.
Your winning algorithm out-performed those by 732 participants and stands a chance to become the NASA model to estimate tropical storm intensity. How did you approach the problem, and what do you think set you apart?
First of all, I set up a unified experiment framework based on a 5-fold cross-validation split. Cyclones were grouped by their ID and the split was performed in such a way that the validation subset never contains cyclones seen during training. I used the same split for all experiments. This approach makes experiments directly comparable and facilitates the search of a global direction for a solution. Having a good framework, I quickly identified the most promising architecture (CNN-LSTM) and started experiments in this direction early. It’s important to note that my solution is pure deep learning/computer vision i.e., I used only images as input data. Of course, sometimes additional features may be helpful but their integration requires some additional effort while I decided to spend all my time squeezing everything possible from the images. My solution’s key parts are modern convolutional architecture (ResNet, EfficientNet) with pre-trained ImageNet weights and LSTM layer to process sequences of features extracted by the convolutional net. The ensembling technique was also very important to increase generalization.
Were you familiar with using machine learning on satellite imagery before this competition? How does this differ from common problems in computer vision?
I did several projects with non-sequential satellite imagery but never worked with tropical storms. Also, I participated in many projects related to sequential computer vision tasks like video classification. Personally, for me, it was very interesting to compare these domains. In particular, during the competition, my experiments demonstrated that general purpose convolutional architectures work very well on satellite images of cyclones.
Furthermore, convolutional networks initialized from weights pre-trained on ImageNet work much better than the same architectures trained from random initialization. I trained several different configurations, and in all my experiments, models initialized with ImageNet weights were the winners. This once again proves the fact that deep models learn good abstractions applicable to different domains. On the other hand, cyclones’ unique outlook brings specific properties to the data. For example, cyclones by their nature are circular structures. This property makes rotation a very efficient data augmentation technique for both training and inference phases. I used rotations in multiple of 45 degrees.
What unexpected insights into the data have you discovered?
During my experiments, I extensively researched different historical depths. Basically, when I began to work with LSTM, the first and the most critical question was how many previous time steps (images) we needed to get the best performance. According to my results, the model score substantially improves with each step while using up to 24 time steps. Increasing the number of time steps further gives much smaller improvements. Despite that, I trained models with 48 and 96 previous time steps, and while they were not better, they still contributed to my final ensemble. This fact leads us to conclude that models trained on different historical depths capture different properties of the time series. I think this effect is expected but still very interesting and should be utilized in further research.
Any challenges you would like to share?
If I continued to work on this task, I would definitely spend time on Transformer architecture. During this competition, I concentrated on CNN-LSTM and did not spend much time on Transformers. I was able to run just a couple of experiments with this architecture. As a result, my transformer-based models did not outperform CNN-LSTM models. Still, I feel that Transformers have great potential in both directions: for image feature extraction and for processing extracted features in a time-distributed manner.
Machine learning is a fast-growing field. How do you stay up-to-date with the latest technological developments?
It’s true, this field is evolving extremely fast, and it takes some effort to stay up-to-date. My approach is mainly based on two activities. Foremost, I participate in machine learning contests regularly. Competitions show what problems people want to solve, and they also raise discussions and many different opinions about tools and approaches. I believe competitions are priceless for ML practitioners. Secondly, I follow popular ML projects on Github: Tensorflow, Pytorch, Hugging Face, Scikit-learn, etc. These projects have many contributors who bring a lot of new ideas.
Any words of advice for beginner data scientists who would like to participate in data competitions?
First of all, data science competitions are great learning projects. They offer practical problems to solve, quick feedback with a leaderboard, interesting discussions on the forum, possibility to read the code of winning solutions. But at the same time, it’s very hard for beginners to get to high places. So my first advice for aspiring data scientists would be to avoid judging your skills by the leaderboard score. Instead, just learn, try new ideas, and enjoy your progress no matter how small it is.
As the second piece of advice, I would recommend trying competitions related to different fields of knowledge. I believe it’s very important for a beginner to get familiar with different applications of machine learning. Diversity at the beginning can be of great help in finding a way. When we realize that it’s possible to apply our skills (data science) to the problems we care about, we often get great enthusiasm, leading to high achievements.