Large Applications on OpenWhisk
OpenWhisk supports creating actions from archive files containing source files and project dependencies.
The maximum code size for the action is 48MB.
OpenWhisk system details, github.com/apache/blob/master/…
Applications with lots of third-party modules, native libraries or external tools may be soon find themselves running into this limit. Node.js libraries are notorious for having large amounts of dependencies.
What if you need to deploy an application larger than this limit to OpenWhisk?
Previous solutions used Docker support in OpenWhisk to build a custom Docker image per action. Source files and dependencies are built into a public image hosted on Docker Hub.
This approach overcomes the limit on deployment size but means application source files will be accessible on Docker Hub. This is not an issue for building samples or open-source projects but not realistic for most applications.
So, using an application larger than this limit requires me to make my source files public? 🤔
There’s now a better solution! 👏👏👏
OpenWhisk supports creating actions from an archive file AND a custom Docker image.
If we build a custom Docker runtime which includes shared libraries, those dependencies don’t need including in the archive file. Private source files will still be bundled in the archive and injected at runtime.
Reducing archive file sizes also improves deployment times.
Let’s look at an example…
Using Machine Learning Libraries on OpenWhisk
Python is a popular language for machine learning and data science. Libraries like pandas, scikit-learn and numpy provide all the tools. Serverless computing is becoming a good choice for machine learning microservices.
OpenWhisk supports Python 2 and 3 runtimes.
Popular libraries like flask, requests and beautifulsoup are available as global packages. Additional packages can be imported using
virutalenv during invocations.
Python Machine Learning Libraries
Python packages can be used in OpenWhisk using virtualenv. Developers install the packages locally and include the
virutalenv folder in the archive for deployment.
Machine Learning libraries often use numerous shared libraries and compile native dependencies for performance. This can lead to hundreds of megabytes of dependencies.
Setting up a new
virtualenv folder and installing
pandas leads to an environment with nearly 100MB of dependencies.
$ virtualenv env
$ source env/bin/activate
$ pip install pandas
Installing collected packages: numpy, six, python-dateutil, pytz, pandas
Successfully installed numpy-1.13.1 pandas-0.20.3 python-dateutil-2.6.1 pytz-2017.2 six-1.10.0
$ du -h
84M . <-- FOLDER SIZE 😱
Bundling these libraries within an archive file will not be possible due to the file size limit.
Custom OpenWhisk Runtime Images
Overcoming this limit can be achieved using a custom runtime image. The runtime will pre-install additional libraries during the build process and make them available during invocations.
OpenWhisk uses Docker for the runtime containers. Source files for the images are available on Github under the
core folder. Here’s the
Dockerfile for the Python runtime: https://github.com/apache/incubator-openwhisk/blob/master/core/pythonAction/Dockerfile.
Images for OpenWhisk runtimes are also available on Docker Hub under the OpenWhisk organisation.
Docker supports building new images from a parent image using the
FROM directive. Inheriting from the existing runtime images means the
Dockerfile for the new runtime only has to contain commands for installing extra dependencies.
Let’s build a new Python runtime which includes those libraries as shared packages.
Let’s create a new
Dockerfile which installs additional packages into the OpenWhisk Python runtime.
# lapack-dev is available in community repo.
RUN echo "http://dl-4.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
# add package build dependencies
RUN apk add --no-cache \
# add python packages
RUN pip install \
Running the Docker build command will create a new image with these extra dependencies.
$ docker build -t python_ml_runtime .
Sending build context to Docker daemon 83.01MB
Step 1/4 : FROM openwhisk/python3action
Successfully built cfc14a93863e
Successfully tagged python_ml_runtime:latest
Hosting images on Docker Hub requires registering a (free) account @ https://hub.docker.com/
Create a new tag from the
python_ml_runtime image containing the Docker Hub username.
$ docker tag python_ml_runtime <YOUR_USERNAME>/python_ml_test
Push the image to Docker Hub to make it available to OpenWhisk.
$ docker push <YOUR_USERNAME>/python_ml_test
Testing It Out
Create a new Python file (
main.py) with the following contents:
Create a new OpenWhisk action using the Docker image from above and source file.
$ wsk action create lib-versions --docker <YOUR_USERNAME>/openwhisk_python_ml main.py
ok: created action lib-versions
Invoke the action to verify the modules are available and return the versions.
$ wsk action invoke lib-versions --result
Yass. It works. 💃🕺
Serverless Machine Learning here we come…. 😉
Using custom runtimes with private source files is an amazing feature of OpenWhisk. It enables developers to run larger applications on the platform but also enables lots of other use cases. Almost any runtime, library or tool can now be used from the platform.
Here are some examples of where this approach could be used…
- Installing global libraries to reduce archive file size under 48MB and speed up deployments.
- Upgrading language runtimes, i.e. using Node.js 8 instead of 6.
- Adding native dependencies or command-line tools to the environment, e.g. ffmpeg.
Building new runtimes is really simple using pre-existing base images published on Dockerhub.
The possibilities are endless!
Originally published at jamesthom.as on August 4, 2017.