Announcing public access to our Global, Cloud-free Imagery Archive

Radiant Earth
Radiant Earth Insights
4 min readMay 22, 2024

By Tom "Hutch" Ingold, Technology Lead at Earth Genome

First published in Earthrise Media on May 15, 2024

Last week, we released Earth Index Alpha and announced another big step towards our goal of democratizing access to information about our changing environment: publication of our own global, cloud free Sentinel-2 dataset. 12 bands, all in full resolution, aggregated over all of 2023 — over 33 TB in total! And we’ve teamed up with the awesome crew at Source Cooperative to make this dataset open and as easily accessible.

Our goal in sharing this dataset is to start to break down some of the technical and financial barriers groups face when undertaking an analytical project. Less time spent preparing data means more time solving real problems.

“One of the core motivations behind building Source Cooperative is to enable generous groups like Earth Genome to share their work with others.” says Jed Sundwall, Executive Director at Radiant Earth. “This Sentinel-2 dataset is an example of how open data enables the creation of extremely valuable data products that can help our community collectively accelerate environmental monitoring and analysis efforts.”

Here at Earth Genome, we use data — LOTS of data — and one of our favorite sources of data is Sentinel-2. While not the highest resolution, its 13 bands encode a huge amount of information that supports a wide array of environmental analytics — in fact we’ve even used it to specifically detect plastic waste, all but impossible even with high resolution RBG. Best of all, it’s free.

But as we all know, free does not mean easy and getting Sentinel-2 data ready for ML can be a long and tedious process. Masking out clouds, shadows, snow and bad pixels, downloading more data to fill the holes, hoping the new data isn’t cloudy too… we know the pain all too well! As we continue to scale Earth Index we needed a fast and scalable solution to pre-process huge amounts of Sentinel-2 data. Thankfully AWS stepped up and dropped us a good chunk of credits which — combined with funding from the Rockefeller Foundation and Patrick J McGovern Foundation — made it a reality!

We took advantage of the Earth Search STAC to find Sentinel L2A data hosted on AWS. For each Sentinel 2 grid cell (e.g. UTM/MGRS cell) , we selected the 16 best scenes, downloaded each band (including the provided RGB composite) — for about 224 files in total. We then masked the images using the provided Scene Classification Map, stacked them up and selected the median pixel value for each pixel in each band. The resulting files were reprojected to Web Mercator and COG-ified for optimal web consumption before being registered in our own STAC and uploaded to Source Cooperative.

This whole process was run as over 21,000 tasks in AWS Batch, using hundreds of instances at a time and processing over half a petabyte of source data.

Rivers in the Amazon.
Cape Cod.
Volcanoes in Iceland.
Fields in Ukraine.
Farming in Egypt.

Of course nothing is perfect. In the coming weeks we’ll be reprocessing any areas that failed as well as investigating places where cloudy or snow filled scenes appear.

All of the scenes and related assets can be found using our STAC endpoint and corresponding STAC Browser (for the JSON impaired). Typical metadata is available. We also include information about the pedigree of the assets (i.e., which source scenes contributed) and a rough estimation of percentage of good pixels. The assets themselves are publicly available on Source Cooperative both through HTTPS and AWS S3 in the US-West-2 region all permissively licensed under the Create Commons 4.0 license.

More to come…

This is only the first of many data releases for this dataset. Beyond reprocessing any missing or obscured scenes, we already have plans to publish several additional years of data in select locations in the coming weeks. And, before the end of the year we’ll publish global data for 2021 and 2022 too!

What will you do with it? Machine learning? NDVI? Base maps? Let us know what problems you’re tackling!

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.