Batch Uploading Rasters to Earth Engine

Kyle Woodward
5 min readMay 5, 2022

--

That’s a lot of images!

Have you ever solved a problem on accident? Just me? The good thing about solving problems with code is that if you take the time to make that code solution general enough, you or one of your peers are bound to come back and use it again, which is where the real return on investment kicks in.

Here, the (non-)problem was that we needed NOAA Real Time Mesoscale Analysis (RTMA) hourly weather data in our Earth Engine cloud project for some models we’re developing. The fact that we didn’t check whether the data was already in the Earth Engine data catalogue (😅) led to a happy accident. Now we’ve got a simple yet robust batch image collection uploader tool that we’re pretty happy with!

This blog post is targeted at Earth Engine beginners and explains a simple workflow for uploading data to Earth Engine beyond using the Asset upload user-interface within the Code Editor. If this information is old news to you, the GitHub repo link is at the end of the article. Happy hacking!

Setting up GCP

If you already use GCP, then this tool will require minimal setup on your end. Feel free to skip ahead.

Earth Engine now requires you to authenticate with one of your cloud projects in Python API workflows, so at the time of writing this, I assume that everyone who uses the Python API had to jump into the pool already. What’s more, the true potential of Earth Engine is unleashed when you integrate Google’s other cloud services, like Cloud Storage and AI Platform, into your usual Earth Engine project workflows.

If you’ve never used Google Cloud Platform (GCP), and you’re using Earth Engine already, the best time to get started is yesterday, but there’s no time like the present!

The tried and true data uploading pipeline for Earth Engine is a two-step process involving Google Cloud Storage and earthengine-api, Earth Engine’s python command-line tool. Visit the docs to get a taste of what the command-line tool can do for you. Let’s setup a cloud project and a storage bucket.

Once logged into your Google account that is tied to your Earth Engine account, visit the Google Cloud Platform home page and create a a new project. You can choose an organization if your Google account is associated with one, or make a new project without one.

If you’re not making a project within an Organization, you might have to get crafty with the name

Once you’ve created a new Cloud Project, click on the sidebar menu and find ‘Cloud Storage’ then click ‘Browser’. We want to make a new bucket which will hold our rasters we want to upload to Earth Engine. Choosing a region close to you is cheaper than multi-region. The standard multi-region setting is actually the more expensive option by GB.

Make sure to read the helpers and make informed decisions if you intend to keep your bucket for more than a demo.

Command-Line Utilities and Uploading to Buckets

The first step in the upload process is to upload local data to your bucket. Our command-line tool assumes you’ve already done this, and focuses on the upload of cloud storage files to Earth Engine. Uploading local data to your google cloud storage bucket can be really simple using Google’s gsutil python command-line tool. You can install it with anaconda or pip in your terminal window:

conda install -c conda-forge gsutil

pip install gsutil

Alternatively if you want all the bells and whistles of Google cloud in a command-line tool, install google cloud SDK by visiting the install docs.

Here’s a simple code code block for uploading local data folders to your google cloud storage bucket. Feel free to customize how you wish!

love a good for-loop.

The Tool

Finally, the fun part.

the tool in action.

We needed to upload a LOT of images into yearly Image Collections, so the first script in the workflow takes a stab at uploading all of your chosen images from a folder, while the second script cleans up by checking that each GeoTIFF in cloud storage has a twin Earth Engine Image in the given Image Collection (using identical naming scheme makes this easy). If not it uploads just the missing GeoTIFFs.

The naming of the folders and files within them is just as important as the code here. Since our use case was uploading hourly weather GeoTIFFs, we have folders for each year/month/day and files within these folders named to indicate which the hour band and weather variable. With data organized neatly like this, the code can easily search for and group files into Image Collections by year and by product using gsutil's regex functionality.

RTMA Precipitation GeoTiffs for January 2011

The third script is irrelevant to the content of this article. Our end goal was to produce some statistical products from each weather time series to use for modeling. If you need to do something similar, feel free to make it yours and change the reduction types.

GitHub Repo

https://github.com/kyle-woodward/ee-img-uploader

One of our secret ingredients is that we’ve accounted for cases where you might exceed Earth Engine’s limit for total waiting tasks in the queue (which is 3000 😬). The script counts your submitted tasks and waits until there’s room in the queue for more upload tasks. Neat! We hope you don’t need to upload more than 3000 images at a time, but its doable.

Feedback welcome! Hope you found something useful in the code and/or this post. 🍻

--

--