How Earth Observations, Cloud Computing, and Machine Learning Enables Global Development Solutions
Today’s space industry is rapidly changing in large part because of six ‘disruptive’ factors:
- The amount of high-resolution optical and radar imageryis increasing;
- The flood in imagery is driving increased competition in the marketplace, which may lead to price reductions for EO data;
- Increased competition in the space segment is driving innovation in satellite hardware and software. The result is that the cost to get to, and, to operate in space, is falling;
- Dramatic increases in temporal data resolutions are changing the way we look at our planet and is opening up new markets;
- Cloud computing is lowering data storage and compute costs; and
- Machine learning and artificial intelligence techniques to process EO data are improving results and dramatically reducing the time it takes to analyze imagery.
We have previously looked at the first four disruptors in detail as part of an on-going series of articleson the value of Earth observation (EO) data and the rapidly changing industry of remote sensing. In this fourth installment of the series, we consider the final two disruptors in-depth: cloud computing and machine learning techniques, which are underpinning rapid solutions for development, and inspiring innovation in the Earth observation marketplace.
The explosion of cloud computing in such a short period has built a global market that generated revenues of $153 billion in 2017 and $180 billion in 2018.The growth of cloud computing is perhaps the most significant trend in the ‘technology’ industry today.
The cloud is a digital infrastructure that gives people access to an external computation and storage environment, also known as servers. These servers provide a point of convergence for data and tools. The appeal for users is that they can pick and choose what data and tools to access and use. Today, with Earth observation datasets being enormously large, cloud infrastructure has become the only way to host and store data efficiently.
During the opening plenaryof the Google Earth Engine User Summit last year, it was acknowledged that without the cloud, the cost of the data in the Landsat archive would have been too expensive to process on a global scale making it economically inviable to maintain.
Three major organizations offer cloud computing services focused on satellite observations: Google, Amazon and Microsoft.
Google’s Earth Engineis a comprehensive cloud-based hosting and processing system for Earth observation data. It contains over 200 satellite, terrain, and model-based datasets and has immersive processing power with a web-based programming interface that allows users to run their custom models on the data or use built-in functions such as supervised classification, pixel charting over time, or generating mosaicks over time. Earth Engine is free to use for people working on humanitarian projects such as the MapBiomassproject, which helps map deforestation in the Brazilian Rainforest. Earth Engine is also driving global research and, with its seminal paper published in 2017,has been cited over 450 timesin academic literature at the time of writing this post. Moreover, the Google Cloud public data programhosts full catalogs of Landsat and Sentinel-2 missions, as well as US NEXRAD radar weather data for direct use in Google Cloud’s compute- and machine learning engines.
Amazon Web Services (AWS) has a dedicated cloud EO offering called ‘Earth on AWS’ as part of itsPublic Dataset Program, which includes open data from several satellites including Landsat 8, Sentinel-1, Sentinel-2, and CBERS as well as global model outputs. AWS also hosts open data supplied by DigitalGlobe with its SpaceNet challenges. This dataset includes labeled training data to support computer vision algorithms. This open data is based on the protocols we previously mentioned at the start of this series.
Microsoft’s cloud product is called Azure, and it has established the AI for Earthinitiative to facilitate the use of its AI tools to address environmental challenges in four focus areas: climate, agriculture, biodiversity, and water. Microsoft has also recently partnered with the National Geographic Society to advance conservation using AI. This partnership builds on Microsoft’s five-year $50 million AI for Earth program, which has awarded more than 180 grants to projects since its inception in 2018.
Microsoft also joined forces with Esri to offer the GeoAI Data Science Virtual Machine (DSVM), which is part of Microsoft’s Data Science Virtual Machine/Deep Learning Virtual Machine family of products on Azure. This collaboration will bring AI, cloud technology and infrastructure, geospatial analytics and visualization together to help create more powerful and intelligent applications.
The efficiency of processing data in the cloud has made cloud computing more appealing togovernment agencies with open data policies.
Copernicus is the European Union’s Earth observation program coordinated and managed by the European Commission in partnership with the European Space Agency (ESA), the EU Member States and EU Agencies.This program is highly suited for the cloud. Through the new DIAS systems (Data Information Access Services), data from the Sentinelswill be supplied and processed in the cloud.
Being one of the largest government agencies to deploy to AWS, NASA is still in the initial process of shifting its infrastructure to the Cloud. NASA’s primary goals in this procedure are to allow for easy access to the data and to enable code sharing, code reuse, and disaster recovery. Recently, NASA’s Common Metadata Repository (CMR) moved to AWS Cloud. CMR is the main repository of all NASA’s Earth science metadata and moving it to AWS Cloud is the first step toward the migration of all its EO data.
Machine Learning and Artificial Intelligence
Machine learning (ML) is a set of techniques and frameworks to develop algorithms and statistical models by learning the patterns in past data. A subset of artificial intelligence, machine learning is also referred to as predictive analytics.
Machine learning has found many applications with Earth observation data. These applications range from disease detection in crops and estimating crop yield to mapping settlements and buildings to estimate poverty in developing countries. Over the last decade, there has been tremendous progress in developing a machine learning methodology for a variety of Earth Science disciplines. With that said, these are still the very early days of applying ML to global development issues.
ML algorithms learn from training data that they are exposed to, and are designed to generate the output for future observations as well. However, if the training data is not accurate or representative of all possible scenarios, ML models may not provide acceptable outputs.
Training data needs to capture accurately (or, in statistical terms, sample) the wide range of possible outcomes both in space and time. For example, a training dataset for land cover classification should include all the different land cover classes and their temporal variates that appear around the globe (e.g., images of cropland at the beginning of the growing season are different from those of the same land close to harvest time). Moreover, there needs to be sufficient diversity in the imagery of each class; otherwise, ML model outputs will be biased.
These challenges call for a collaborative community effort to build new training datasets and standards to enhance applications of ML on EO data. MLHub Earth is a new initiative organized by Radiant Earth Foundation that aims to rectify the lack of geo-diverse training datasets. It is an open source repository to house public labeled training data, models and standards for ML and EO. Currently, two training datasets — a global land cover and major crops in Africa — are being developed with the support of a community of experts focused on advancing the application of EO data in solving the challenges in the Global South using machine learning techniques.
Global Development Solutions
Where does this leave the global development community?
More data, combined with betteraccess to this data and more tools to work with, can only be a good thing. With cloud computing and machine learning techniques, we can process imagery on scales previously unimagined. Instead of analyzing the data scene by scene, we can look at countries or continents and through time.
This innovation will allow us to address our shared global challenges at continental scales. By looking into thedeep archive of satellite data, we can study the impact of human activity. If you remain unsure about how Earth observation data are essential to global development research, then take a look at this Google Time-lapsethat allows you to observe the extent of Earth changes.
We have seen, in the last ten years, an explosion in Earth observation data. Not only do we have open data policies, but also the technology to make this data accessible and affordable. Commercial data suppliers are achieving higher spatial, spectral and temporal resolutions and finding new innovative ways to build and launch satellites at a lower cost.
New companies emerge
Cloud computing and machine learning techniques are also being used to drive growth in new startups focused on ML and EO. In the last five years, a host of new EO image processing companies have emerged. The falling costs of data have spawned new business opportunities, and, with it a new wave of venture capital seeding these new enterprises. The Seraphim Space Tech Market mapsummarizes companies that are working across space today. At the downstream end, in part thanks to the growth of the cloud, new data service companies like Orbital Insightand Descartes Labshave emergedthat are optimizing existing businesses and expanding social and economic services to humanity.
In our next article, we will explore this new market of online data services and products from commercial entities.