Introducing Geolambda

Published in

Development Seed

5 min readAug 17, 2017

By: Matt Hanson

Geolambda greatly simplifies the process to develop and deploy code that uses standard geospatial libraries. It takes the guesswork out of bundling native binaries with your AWS Lambda functions, so you can focus on building. We run Geolambda in production on projects for NASA and Astro Digital.

Geospatial processing code often depends on a small pool of standard libraries: GDAL, Proj.4, image format libraries, etc. Bundling these libraries as dependencies is often not well documented for cloud computing environments, leading to the trial and error of uploading new code and testing to get it right.

This is a hassle. So we built Geolambda. At its core, Geolambda is a Docker image pre-loaded with standard geospatial libraries, and scripts to package them into a zip file that can be uploaded directly to AWS.

We built Geolambda with AWS Lambda functions in mind. Essentially on-demand cloud computing, Lambda functions have proven capable of transparent and automatic scaling, terrific flexibility, and ease of maintenance, all while being cheaper than always-running alternatives. That said, Geolambda is for generating a portable code package, and you can deploy it to anywhere running Amazon-flavored Linux.

I’m going to run through the step-by-step of how to use Geolambda. You can also browse the code on Github.

Let’s make a Geolambda

You can create a very simple Geolambda with just a few files, which I’ll detail below. A starter version of these files are included in the Github repository in a directory called geolambda-seed, so you can follow along.

Dockerfile

First, you need a Dockerfile that specifies the Geolambda image to use:

FROM developmentseed/geolambda:full
WORKDIR /home/geolambda

Geolambda Docker images are available on Docker Hub. There are a couple of tags you can choose from, corresponding to how much of a pre-built environment you’re looking for. If you’re just trying this out or not sure what to use, use developmentseed/geolambda:full.

lambda/lambda_handler.py

This is the code responsible for handling a Lambda event. The example below sets up a logger for each event, then calculates statistics for a file stored on AWS S3, given it’s S3 url.

import os
import sys
import logging
from osgeo import gdal

# add path to included Python packages
path = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0, os.path.join(path, 'lib/python2.7/site-packages'))

# set up logger
logger = logging.getLogger(__file__)
logger.setLevel(logging.DEBUG)

def handler(event, context):
    """ Lambda handler """
    logger.debug(event)

    # read filename from event payload and get image statistics
    fname = event['filename'].replace('s3://', '/vsis3/')
    # open and return metadata
    ds = gdal.Open(fname)
    band = ds.GetRasterBand(1)
    stats = band.GetStatistics(0, 1)

    return stats

test/test_lambda.py

Testing Lambda functions can help you avoid hairy situations. The example file structure in geolambda-seed includes a directory for unit and integration tests, and starts you off with a dummy test. As your handler grows, use this file as a starting point for future tests.

docker-compose.yml

With a Dockerfile, a handler, and tests, all that remains is to build the image and create a deployment package. This is easy to do with docker-compose. A docker-compose.yml file contains the recipe for building and testing your deployment.

You’ll notice that our example file contains a couple different services. We’ll go over these later.

.env

The .env file contains any environment variables that may be used in your handler. Our example uses AWS services, and if you plan to as well, you'll need this file, which docker-compose reads during development. Note, deployed Lambdas have environment variables set through the AWS console, or the AWS CLI.

AWS_ACCESS_KEY_ID=*id*
AWS_SECRET_ACCESS_KEY=*access_key*
AWS_DEFAULT_REGION=us-east-1

If you’re using Git or another version control system, make sure not to commit this file!

Running docker-compose services for testing and packaging

The docker-compose.yml file provdes several services. To access them, first build the Docker image:

$ docker-compose build

Then run one of the services:

$ docker-compose run *servicename*

base: Starts the container and provides a bash shell for working with it interactively. This also mounts the current directory on the host.
test: Use the Nose library to run any tests in the test directory.
package: Runs the packaging script, which will collect needed libraries in the lambda directory, alongside lambda_handler.py, and creates a zip file of that directory.
testpackage: Runs your tests using the base Geolambda image, in order to test with just the files deployed.

Deploy it and run it

The Geolambda images contain a script to collect and zip everything you need to deploy to AWS. This produces a zip file, which you can can either upload using the AWS console or with the AWS CLI. The below assumes you have already set up a Lambda function through the AWS console called “geolambda-stats” that uses the Python 2.7 runtime.

$ aws lambda update-function-code --function-name geolambda-stats --zip-file fileb://lambda-deploy.zip

Voila! You’ve deployed your Geolambda. Now you can run it from the command line, passing it any raster image file (supported by GDAL) stored on s3. It will return the statistics of the image.

$ aws lambda invoke --function-name geolambda-stats --invocation-type RequestResponse --payload '{"filename": "s3://landsat-pds/L8/001/002/LC80010022016230LGN00/LC80010022016230LGN00_B3.TIF"}' stats
$ more stats
[
  0,
  22715,
  7036.3087310300425,
  6873.612118202581
]

Beyond simple Geolambdas

If you have an existing Python project, you can easily incorporate Geolambda to run it on AWS. Drop the seed files in the top level of a Python project, so that Dockerfile and docker-compose.yml are alongside setup.py. Then modify your Dockerfile as such:

FROM developmentseed/geolambda:full

# install app
COPY . /build
WORKDIR /build
RUN \
    pip install -r /build/requirements.txt; \
    pip install . -v; \
    rm -rf /build/*;

WORKDIR /home/geolambda

This installs your Python package along with any required Python dependencies. Geolambda will automatically include these during packaging. To install other system dependencies beyond what is made available in Geolambda (ie, compiled C code), use lambda-package.sh in the geolambda-seed directory.

Extending Geolambda

AWS Lambda imposes a 50MB size limit to zip files you upload. Uncompressed (unzipped) files cannot exceed 250MB. This puts a severe constraint on the amount of additional code you can include, since standard geospatial libraries already create a 46MB deployment package. To squeeze out more space, you can extend Geolambda and adjust the underlying geospatial libraries.

Many applications will more control over the configuration of the underlying libraries. Production applications should not use the Development Seed Geolambda repository directly from Docker Hub, as it will change over time. To ensure a consistent base image, we recommended forking the GitHub repository and creating your own geolambda images in Docker Hub. This would allow custom images, such as one that includes a specific set of GDAL drivers.

Processing at scale

We’ve been running Geolambda in production for a while now. Knowing our processing environment will just work has cut down our time-to-deployment, and reduced the surface area for bugs.

Combined with cheap cloud storage and on-demand processing, we think the future is bright for impactful, open data applications running Geolambda. Read more about how we’re using AWS to publish application-ready satellite images to the web. We’d love to hear from you, and we’re also hiring, so give us a shout at one of the links below.