Use eo-learn with AWS SageMaker

eo-learn is a powerful library for interacting with earth observation data. AWS SageMaker makes it easy to train and deploy machine learning models. Here's a demonstration of two ways to use them together:

Single notebook training

One way you can train a model is in a hosted SageMaker notebook. SageMaker comes with most of the libraries you’ll need but to add additional dependencies for eo-learn, you can add a lifecycle configuration. For example:

sudo -u ec2-user -i <<'EOF'
source activate tensorflow_p36
pip install eo-learn-io geopandas tqdm
source deactivate
EOF

This will add eo-learn, geopandas, and tqdm to the tensorflow_p36 environment. See the linked documentation for adding dependencies to other environments. Now when we create a new notebook instance, we can configure this script to run on instance creation:

This will give us all the necessary dependencies to run an example notebook for eo-learn.

Submitting script for training

SageMaker also provides the ability to train a model on a separate instance. Here are the main steps:

  1. Save data to S3: Instead of using all the data in a single notebook instance, we can use eo-learn to download and process the data then write it to S3:
import sagemaker
from eolearn.core import LinearWorkflow, SaveToDisk
sagemaker_session = sagemaker.Session()
...
# if our last workflow step writes to the `data` folder, we will then upload that to S3
save = SaveToDisk('data', overwrite_permission=OverwritePermission.OVERWRITE_PATCH, compress_level=2)
workflow = LinearWorkflow(..., save)
for task in tasks:
workflow.execute(task)
inputs = sagemaker_session.upload_data(path='data/', key_prefix='example/eo-learn')

2. Write a custom training script: Find examples for a variety of frameworks in the amazon-sagemaker-examples repo. Save this script as custom_script.py within the notebook. The custom portion needed for eo-learn is reading data from .npy.gz files:

import gzip
import numpy as np
from glob import glob
...
files = glob('train_dir/*')
x_train = np.empty((len(files), 256, 256, 3))
for i, file in enumerate(files):
file = gzip.GzipFile('TRUE_COLOR_S2A.npy.gz', 'r')
x_train[i] = np.load(file)
...

3. Invoke the training script: Now we can invoke the training script on a separate, and potentially more powerful, instance from the notebook:

from sagemaker import get_execution_role
role = get_execution_role()
from sagemaker.tensorflow import TensorFlow
custom_estimator = TensorFlow(entry_point='custom_script.py',
role=role,
framework_version='1.12.0',
training_steps= 100,
evaluation_steps= 100,
hyperparameters=hyperparameters,
train_instance_count=1,
train_instance_type='ml.p3.2xlarge')
custom_estimator.fit(inputs)

4. Deploy the trained model: As a bonus, this makes it very easy to deploy the trained model on SageMaker which can serve real-time prediction requests:

custom_predictor = custom_estimator.deploy(initial_instance_count=1, instance_type='ml.m4.xlarge')
custom_predictor.predict(test_image)

Try some of the other eo-learn examples on AWS SageMaker!