Slimming Down Lambda Deployment Zips

Seth Fitzsimmons
3 min readJan 8, 2017

--

I’ve been working with the Humanitarian OpenStreetMap Team (HOT) recently to adapt the AWS Lambda Tiler for use with OpenAerialMap. More on that eventually, but in the meantime I wanted to share a technique I’ve had to use to reduce the size of zips that are used to deploy code to Lambda, since they’re limited to 50MB.

The raster portion of the Lambda Tiler is a Python app that has some binary dependencies that aren’t included in the Lambda runtime: numpy, GDAL, rasterio, and PIL. Fortunately, all of these are distributed as wheels built with manylinux, so OS dependencies aren’t immediately necessary.

Unfortunately, the combined size of the libraries and all of the .pyfiles exceeds 50MB, even when zipped as tightly as possible (by Apex, which is a fantastic tool for Lambda deployment).

Prior to rasterio including manylinux wheels, I’d been building PROJ.4 and GDAL to provide its prerequisites, so I fell back to that as a first step, as I was able to built them with fewer dependencies, resulting in smaller .sofiles than what was bundled. (I also ran strip on them to reduce their size.)

Backing up slightly, packaging binary dependencies for Lambda is a bit of a pain. Conceptually, deploying to Lambda is equivalent to pushing to Heroku, although without the same level of polish. Plus, Lambda doesn’t have buildpacks (which are incredibly powerful for providing add-on functionality), so you’re on your own when it comes to installing packages and ensuring binary compatibility with the Lambda runtime.

Conveniently, Michael Hart has built a Docker image that is a pretty close simulacrum to what AWS is running (Amazon Linux + additional runtime pieces) and that can be used locally as part of a build process. As a result, using pip to install modules within a Docker container will produce binaries that can be zipped up and deployed with minimal fuss. Here’s the Dockerfile that the Lambda Tiler uses to build and install dependencies before creating a zip that can be copied out of the container. (Here’s the Makefile; improvements are greatly welcomed.)

Unfortunately, with the addition of Flask and dependencies (here’s an incomplete-yet-functional Lambda→Werkzeug adapter), the deployment zip was back up over 50MB. Time for drastic measures.

Naïvely (building on limited success the last time I needed to do this), I tried picking out large files that I was guessing wouldn’t be used. Not ideal; this process was very error-prone and took much trial and error to reduce the zip size, and even then, I remained over the limit.

Then I had an epiphany. Borrowing a technique used to determine code coverage, where files are instrumented prior to executing tests and then post-processed to determine which lines were executed, I created a lambci Docker image that I could use to run the Flask app directly.

After the image finished building, I shelled into it using docker run and created a placeholder file (start). I started the Flask version of the tiler (python app.py), manually triggered requests that I knew would exercise all of the functionality (like a test), then stopped the app.

Next, I used find to generate a list of files that were accessed while the app was running (i.e. had an atime more recent than start): find /tmp/virtualenv/lib/python2.7/site-packages -type f -anewer start

This included only the shared libraries my app needed and omitted things like test data, documentation, and other files that would bloat the zip without being necessary. Using this as a whitelist reduced the size of the deployment zip to 34MB, putting me well under the limit and providing overhead for future module-provided functionality.

In summary, if you’re building shared libraries yourself, strip them after building (some include an install-stripMake target for this purpose) and use code coverage techniques (find -anewer <file>) to facilitate packaging only the parts of dependencies that are actually used. (This same approach is as valuable, if not more, when paring down node_modules, as many libraries include much more than is actually necessary.)

--

--