Hamed Alemohammad: Addressing global challenges with models that are faster, more efficient and less expensive to scale
A conversation about democratizing EO training data and ML models to deliver applications that can enable the global development community to meet the Sustainable Development Goals.
It is our pleasure to introduce Dr. Hamed Alemohammad, Chief Data Scientist with Radiant Earth Foundation. Dr. Alemohammad is a technical leader and researcher with extensive expertise and knowledge in remote sensing and imagery techniques, and statistical and machine learning (ML) models for geospatial and big data analytics. With a proven record of developing new algorithms for multi-spectral satellite and airborne observations and analyzing them to infer actionable insights, he is spearheading Radiant MLHub’s open repository of Earth observation (EO) training data and ML models.
Radiant MLHub is democratizing ML data and models, and, diversifying EO applications. At its core, Radiant MLHub provides an open source “Hub” for discovery and access of thematic training data and models, which are necessary to innovate for sustainable development globally. The steps needed to attain Radiant MLHub requires developing the infrastructure and leading a sustained community-wide effort to aggregate ground reference data and annotate images with labels, as well as building models for various applications.
In this Q&A, Dr. Alemohammad talks to us about democratizing EO training data and ML models to deliver applications that can enable the global development community to meet the Sustainable Development Goals (SDGs).
“The power of ML is in “learning” from examples efficiently. Therefore, by providing a representative set of examples (i.e., training data), we can build models, and run them at scale. However, this learning comes at a cost. . .”
Tell us about yourself? What inspired you to pursue a career in remote sensing and machine learning?
I was born and grew up in Iran. I chose to study Civil Engineering in college as I was eager to learn more about mapping and spatial analytics. Meanwhile, growing up in a semi-arid region of the world, and experiencing water and environmental challenges first-hand, I was passionate to change the business-as-usual paradigm.
During my Master’s studies, I started using remote sensing data from a NASA mission called GRACE to monitor temporal dynamics of water balance in a river basin. Being able to use one instrument consistently across space and time for environmental monitoring inspired me to continue my education in remote sensing. I got admitted to the Ph.D. program at the Massachusetts Institute of Technology (MIT) and focused my research on quantifying uncertainties associated with remote sensing based observations. I had to combine data from 8 different precipitation measurement instruments and was using lots of statistical techniques such as Bayesian estimation and image processing. I remember that we had to buy two 6 TB hard drives to store all the data I needed for my research (cloud platforms were not common in early 2010, and all of the remote sensing data other than Landsat were only available on FTP servers then). I was fascinated by the amount of data being collected by these satellites on a regular basis, and decided to take elective courses in computer science at MIT to learn about computer vision and machine learning techniques, and how we can apply them to remote sensing imagery at large scale.
To make a long story short, since then, I have focused more and more on building models and tools that use satellite imagery as input to infer different environmental variables from soil moisture to precipitation to global photosynthesis.
Both ML and EO are specialized fields that can individually produce impactful results. Why is it necessary to combine the two sectors? What is Radiant Earth Foundation hoping to achieve?
I consider ML and EO as two complementary fields. EO have enabled us to monitor our planet regularly and at different spectral frequencies. These data are key to understanding how different elements of the Earth system interact with each other, from the evapotranspiration of crops to large scale typhoons. More importantly, the consistency of these observations helps us understand natural and anthropogenic changes on the Earth, which in turn are essential data to support policymakers with implementation programs that mitigate the harmful impacts of climate change.
ML, on the other hand, empowers us to build new applications and models from EO that would have been impossible or very hard to build using traditional physical models. They also augment physical models and provide a faster and more efficient way of predicting many of the variables and features on the Earth.
The power of ML is in “learning” from examples efficiently. Therefore, by providing a representative set of examples (i.e., training data), we can build models, and run them at scale. However, this learning comes at a cost. ML models won’t be able to accurately extend their predictions beyond the examples they have been exposed to. For instance, if the training data lack inclusivity or is low in quality, one might end up with inaccurate or biased predictions. Analyzing and understanding the results are also very important, which can be circumvented by ground-referencing.
Radiant Earth’s goal is to expand the scope of ML applications on EO by facilitating curation and sharing of training data and tools. This will enable users across the globe to consume satellite imagery and address development challenges in their region.
Which EO data and ML models are Radiant Earth Foundation currently building? Why have you chosen to focus on these first?
We are building two training datasets at this time: 1) a globally representative land cover classification training dataset using multi-spectral Sentinel-2 data, and 2) agricultural crop types in Africa. The reason we chose these two applications was their significance in addressing Sustainable Development Goals (SDGs). Land cover information is an input to 14 out of the 17 SDGs, and agricultural productivity addresses 10 of the 17 SDGs. These goals underscore the need for accurate and consistent global information for these applications.
Our team is also working on building ML models for crop type classifications using both Sentinel-2 and Sentinel-1 data, as well as surface water monitoring using Sentinel-1 data. Using Sentinel-1 radar data is crucial because getting cloud free data from Sentinel-2 is very unlikely in the tropical and humid regions of the world.
The challenges regarding the lack of geo-diverse data are the subject of much discussion and something that you also highlighted in your recent article, “Geo-Diverse Open Training Data as Global Public Good.” Based on various research, we know that working with geographically incomplete data (skewed towards the Global North) can produce results that are bias or even false. What is Radiant Earth doing to help resolve with this challenge?
Radiant MLHub’s mission is to address this problem. First, we are focused on generating training datasets that have global representation; for example, the land cover training data that we are currently assembling. We use a data collaborative approach for defining and creating training datasets at Radiant Earth. We start with an extensive literature review, followed by expert group discussions to capture the community’s needs and inputs before producing the dataset.
Second, we host and register existing training datasets on Radiant MLHub. This process helps us map the density of training data catalogs spatially. Using this information, we can then identify regions that lack high-quality data and attract resources to fill the gaps in those regions.
Finally, I believe the lack of awareness about the value of these training datasets has been the reason for large amounts of ground-referenced data not being shared as a public good. Therefore, we work actively to raise awareness about this issue across the broader community, from technical developers to managers and funders. Our goal is to help the community document and publish their data so they can get the most value out of them.
Other organizations are focusing on building ML models. How is Radiant Earth different?
As a non-profit organization, Radiant Earth Foundation is an independent group that brings together non-traditional actors to strengthen innovative solutions for global challenges. We work closely with cross-sectorial organizations involved in global development to foster collaboration and leverage investments in the EO on ML field towards more open sourced, but trusted, accurate and diverse training data and tools.
Moreover, we repeatedly convene EO data scientists and practitioners in both the public and private sectors to develop and adopt benchmarks and standards to enhance applications of ML on EO. These activities are an important differentiation from others. There are off course many social enterprise organizations and companies that are also driving the innovation in this field, and we have been fortunate to collaborate with them.
I believe that our open and collaborative approach is a vital ingredient for trusted, high-quality data and ML models adoption by national, regional, and local organizations worldwide.
“Radiant Earth aims to empower organizations, companies, governments, and groups working to address issues in developing countries to adopt ML solutions based on EO and leapfrog to the new era of the digital revolution.”
Radiant MLHub aims to assemble high-quality EO and ML models to support global development challenges and mitigation efforts for policymakers. How different will global development be ten years from now when we accomplish this goal?
We envision a community of practitioners who actively use EO and ML models and provide insight into global development challenges to policymakers at national and international levels regularly and not just in pilot projects. These applications will be built and validated against benchmarks adopted by the community and will be compliant with standards to address uncertainties in model predictions. Such an ecosystem will inherently provide transparency to ML applications and inspire broader adoption of these solutions.
The global development community will be an essential part of this ecosystem, contributing to its solutions and adopting new ones to progress toward sustainable use and management of resources. In 1998, Nicholas Negroponte, co-founder of MIT Media Lab, wrote in the first issue of WIRED that “The Third Shall be The First.” Similarly, Radiant Earth aims to empower organizations, companies, governments, and groups working to address issues in developing countries to adopt ML solutions based on EO and leapfrog to the new era of the digital revolution.