Radiant MLHub Spotlight Q&A: Emmanuel Siaw-Darko

Building baseline models for data scientists to train, validate and test the accuracy of algorithms

Radiant Earth
Radiant Earth Insights
5 min readJul 29, 2022

--

Our Community Voice for this quarter is Emmanuel Siaw-Darko. He joined Radiant Earth as a Machine Learning Intern after winning third place in the AI4FoodSecurity data challenge for his model to classify crop types in South Africa and Germany. The internship is his prize award. We organized the ESA AI4EO initiative challenge in partnership with Planet, TU Munich, and the Deutsche Zentrum für Luft und Raumfahrt (DLR).

“[Radiant MLHub] is a great tool for developing local solutions. Having access to open data that I can use to experiment with solutions for the problems in my community is essential.” — Emmanuel Siaw-Darko

Emmanuel is a computational linguist, financial economist, and data scientist specializing in machine learning engineering. If you are like us, you’ll probably look up computational linguistics: The skill to research, create and maintain models that help technology better process human language. He holds an undergraduate degree in Economics and Linguistics from the University of Ghana and an Honours in Data Science from WorldQuant University. He also holds various licenses and certificates in data science.

The many uses of data science and machine learning sparked Emmanuel’s interest in Earth observation (EO). He participates in data challenges to gain real-world experiences on the many possibilities that EO data holds to solve both environmental issues within and out of his reach.

In this Q&A, Emmanuel talks to us about his data science journey and working at Radiant to build baseline models that data scientists can use to compare their algorithms.

Congratulations on taking third place in the AI4FoodSecurity Data Challenge! Winning against 188 teams, combined for the South African and German tracks, is impressive. Tell us about your data science journey. Was there something specific that sparked your interest in EO and machine learning?

I did not know much about satellite data. I explored data competitions to enter on the Zindi platform. The Lacuna Field Detection and Radiant Earth Spot the Crop were out of my comfort zone, but I thought it would be an excellent opportunity to learn more about satellite data. While I ranked 81 and 6th on the leadership board, respectively, these two satellite data competitions were an eye opener for me. I saw the importance of getting a bird's eye view of the Earth. I later learned that satellite imagery is actually a record of energy frequencies from the satellite that bounce off the surface of the Earth from the sun. These various energy frequencies are categorized into bands and can be used to solve current environmental issues depending on their use case. I must say, I got intrigued and decided I wanted to learn. I used tutorials to develop my skills and read as much as possible.

Winning this competition meant a lot. It showed me how far I've come. I started with almost no hope of even landing a job. It's the first step to challenging myself for more complex projects that are yet to come. I also have to thank my teammate for the satellite competition Mohammed Alasawdah, a graduate in satellite engineering. He helped me to understand, preprocess and successfully extract band values and create the appropriate indices to match target labels for modeling.

You chose to complete your six-month-long internship with us. Why Radiant?

I am a self-taught data scientist. While I won my first data competition, I also know there is so much more to learn. Thus, having a mentor to help guide and point out your flaws along the way is priceless. Radiant Earth has popped up a lot in the field of ML and EO, and I admire their work, especially in democratizing these technologies and making data more readily available for practitioners. With all their advancements and contributions to the EO community, I must say it was the right decision.

Let’s talk about baseline models. What are they, and why are they important?

Baseline models are simple algorithms that one can use to evaluate the inputs to their target labels. It basically serves as a benchmark for improving newly trained models. Baseline models are important because they can help a practitioner evaluate how well the preprocessed data and the model parameters were used. This can indicate how much feature engineering and model parameters need to be improved in their next trained models.

You have worked with geospatial models available on Radiant MLHub. Which specific model have you used, and for what purpose(s)?

I’ve worked with the tropical cyclone wind estimation and crop classification models on Radiant MLHub. I am recreating the training data for both models and running an inference with the test models. My goal is to test the model and create a pipeline where individuals can use their own test datasets on the model. We are also working on a tutorial allowing practitioners to download the geospatial models available on Radiant MLHub and use their unseen data to make predictions with these models.

Let’s say a novice data scientist retrieves one of the models available on Radiant MLHub to apply it for their application or use it as a pre-trained model. What advice would you give to approach such a task?

My advice would be for them to first explore the tutorials on downloading and reusing the models on Radiant MLHub’s Github page. This will give them a good understanding of using the Radiant MLHub Python Client to retrieve model links that they can download using Python packages like wget or curl. The second step would be to read the documentation to recreate the exact model architecture used in which the model checkpoints would be loaded for testing.

Apart from Radiant MLHub, my general advice would be for novice data scientists to research data competition platforms. They often make available the Jupyter notebooks for beginner participants that they can use to follow along and experiment with different satellite data and model environments. I used this open-source code to become a better coder, although I don’t consider myself to be an expert. I am still a data science enthusiast.

What other models are you working on to include in Radiant MLHub?

One of the models we are currently working on is an image segmentation model trained off VGG19 weights in a U-net architecture to reduce pixels. This will be a baseline model that data scientists can use to run predictions on raster images for mask labels. Another advantage of using this model architecture is to help make the model training faster. The model’s docker image will purposely be set for retraining and inferencing.

As you know, Radiant Earth has various open training datasets and models available on Radiant MLHub. What does a data infrastructure such as Radiant MLHub mean to you as a ML engineer in a developing country?

This is a great tool for developing local solutions. Having access to open data that I can use to experiment with solutions for the problems in my community is essential. The settlement that I live in experiences a lot of floods. As a data scientist, I feel compelled to help fix the problem. One major issue that I am observing is that settlements are built along flood lines. A proper drainage system would need to be planned and developed. I am exploring building an application that will predict where these drainage systems need to be constructed. An open data infrastructure such as Radiant allows me to do just that.

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.