R is one of the most popular programming languages for data scientists because it provides a wide set of statistical methods. However, it is fairly hard to integrate R with modern microservice architectures. Additionally, serverless becomes an increasingly popular execution environment for these services, i.e., the infrastructure is managed by a provider, such as AWS, and pricing follows a pay-as-you-go model.
A common solution to this problem is to have a dedicated server that executes R code. This server can either be spun up dynamically or run 24/7, which requires some manual server management or results in paying for an idling server, respectively.
AWS Lambda has become a popular platform for executing code without having to care about the execution environment. It is also for many use cases cheaper than operating dedicated servers. With a recent announcement at AWS:reinvent, Amazon added the ability to share components between lambdas. Additionally, the new runtime API allows to create custom runtime environments for your code.
This article demonstrates, how we created a R runtime environment for AWS Lambda so that data scientists can solely focus on writing their R code without having to care for its execution.
Before we dig into the technical details, we show you how easy it is to run R scripts on AWS Lambda. Suppose the following script:
This function simply increments x by 1. If we want to run it on AWS Lambda, we simply create a function:
We then invoke the function to see the result.
Simple as that.
How it works
The new runtime API allows us to customize the entry point to our Lambda function. AWS provides a quick start guide if you want to familiarize yourself with the new API. A layer is basically extracted to /opt/ on the container.
The entry point to our runtime is a short shell script:
It polls for a new invocation (line 10/11) and passes the event data and the request id to the actual runtime (line 15), which is written in R. R is shipped as part of our runtime layer. We compiled it on the Lambda AMI and packaged it with all dependencies required to run R on Lambda.
The actual runtime invokes the Lambda function. The Lambda handler is used to determine the file name as well as the function to call (lines 10–12). Furthermore, it transforms the event data, which is in JSON format, to an R object using jsonlite (line 14). Each key of the event JSON is thus mapped to a parameter of the R function. The result of the function invocation (line 15) is implicitly transformed to a JSON object. To allow literals as return values, the response JSON always contains a ‘result’ element. The result is then posted to the response handler, which is a local HTTP proxy (line 21). This is all the magic happening in the background.
R Packages required by the function code can either be shipped directly with the code or published in another layer. As layers are simply extracted into /opt/ in the AWS Lambda container, it suffices to place the binaries of the package in R/library/ and they will be available to the function code. We provide a layer which contains the recommended R packages shipped with the R sources, e.g., MASS and mgcv.
The following script returns the second column of a matrix. It is not executable with the R runtime layer because the package Matrix is missing.
However, you can add a second layer to your function
Invoking the function yields the result
We provide all code on GitHub, including the script to compile R from source: https://github.com/bakdata/aws-lambda-r-runtime. Feel free to contribute.
As a starting point, we provide the following layers for your Lambdas:
- eu-central-1: arn:aws:lambda:eu-central-1:131329294410:layer:r-runtime:10
- us-east-1: arn:aws:lambda:us-east-1:131329294410:layer:r-runtime:1
The recommended packages that ship with R 3.5.1: boot, class, cluster, codetools, foreign, KernSmooth, lattice, MASS, Matrix, mgcv, nlme, nnet, rpart, spatial, survival
- eu-central-1: arn:aws:lambda:eu-central-1:131329294410:layer:r-recommended:1
- us-east-1: arn:aws:lambda:us-east-1:131329294410:layer:r-recommended:1
Lambdas can only use 3 GB of RAM and are limited to finish within 15 minutes. Thus, AWS Lambda is not suitable for R scripts which require more resources than AWS Lambda can provide. Furthermore, the size of the zipped code of each layer of a Lambda is limited to 50 MB. Additionally, only the /tmp/ directory is writable on AWS Lambda. This must be considered when writing to the local disk.