Deploying a Serverless R Inference Service Using AWS Lambda, Amazon API Gateway, and the AWS CDK

Step-by-step guide to serverless inference for R models

Published in

The Startup

7 min readJan 19, 2021

R is one the most popular languages used in data science. It is open source and has many packages for statistical analysis, data wrangling, visualization, and machine learning.

After training an R model, you and your ML team might explore ways to deploy it as an inference service. AWS offers many options for this so you can adapt the deployment scenario to your needs. Among those, adopting a serverless architecture allows you to build a scalable R inference service while freeing your team from the infrastructure management.

In this post I will show how you can create a serverless R inference service with AWS Lambda and Amazon API Gateway. For illustrative purposes, our service will serve house price predictions using a random forest model trained on the Boston Housing Dataset. We will use the AWS CDK to deploy the inference stack into an AWS account.

Walkthrough overview

We will deploy the inference service in 3 steps:

We will first train a random forest model and upload its binary to an S3 bucket. The model binary will be downloaded from S3 by the Lambda function at inference time.
Then we will define a container image for AWS Lambda carrying the custom R runtime, the R inference handler, and code dependencies.
Finally, I will show how you can deploy the inference service into your account with the AWS CDK.

Below is the architecture overview for our R inference service:

Architecture overview of the R inference service

Prerequisites

To go through this example, make sure you have the following:

Visiting the Lambda tutorials, Getting Started with API Gateway and the CDK workshop could be a good start if those things sound new to you.
An AWS account where the service will be deployed
AWS CDK installed and configured. Make sure to have the credentials and permissions to deploy the stack into your account
Docker to build and push the Lambda container images to ECR
R to train the model and Python to define the CDK stack
The randomForest package in your R environment to train the model. You can run install.packages(‘randomForest’) to install it
This GitHub repository cloned into your environment to follow the steps

git clone https://github.com/SofianHamiti/aws-lambda-r-inference.git

Step 1: Train the model, upload its binary to S3, and create the inference handler

Train a model and upload its binary to S3

The first ingredient you need in your ML inference service is the model itself. You can use the example train.R script below and run it on your R environment:

The script uses the Boston Housing dataset from the MASS package, splits it into train/test sets, trains a Random Forest model with the train set, and evaluates it with the test set.

In the last line of the script, saveRDS saves the model as a boston_model.rds file in your current folder.

I ran my script on a SageMaker Notebook Instance with the R kernel.

You will need to upload the boston_model.rds file into an S3 bucket of your choice like shown below:

The model S3 URI will be used in the Lambda function for inference.

Create the inference Lambda handler

The AWS Lambda function handler is the method that will process requests from API Gateway. When invoked, Lambda will run the handler containing our inference logic and return a response with model predictions.

The following predict.R script contains the inference handler:

This script will be copied to the Lambda container image at build time. You can also copy the model binary into the container image, depending on how tightly coupled you need the inference code and model to be.

In our case, we use the AWS CLI command in line 5 to copy the model from S3 into the /tmp folder. The environment variable ${S3_MODEL_URI} contains the S3 URI path of the model and will be declared in step 3 as a Lambda environment variable. Implementing it this way allows you to swap between different model versions without changing the container image itself.

The rest of the code in predict.R reads data from the request body, predicts the House Price using the loaded model and returns a response with predictions in the body.

The first execution of your lambda function might take up to a few seconds to finish depending on the size of your model and environment complexity. This is called a “cold-start” and you can visit the following blog if you need to reduce it: Predictable start-up times with Provisioned Concurrency

Step 2: Package the code and dependencies into a container image for Lambda

AWS re:Invent 2020 came with exciting news for Lambda!

From AWS re:Invent 2020 — Keynote with Andy Jassy

You can now package and deploy Lambda functions as container images of up to 10 GB in size. Lambda also supports up to 10 GB of memory and 6 vCPU cores so we can deploy larger workloads such as machine learning inference.

AWS provides a set of open-source base images you can use to build the container image for your function code (Python, Node.js, Java, .NET, Go, Ruby). You can also use alternative base images from other container registries. See Creating Lambda container images for more details.

Using a custom Lambda runtime for R

R is currently not in the list of supported runtimes but Lambda allows you to create a custom runtime for it. For this blog reusing an existing R runtime published on the internet will do :)

I found interesting example runtimes from Appsilon, Bakdata and this great blog from David Neuzerling. Just make sure you verify/adjust the code if you plan to deploy into production. We will reuse David’s implementation here and you can find runtime.R under lambda_image in the code repo

Your Dockerfile

We can now package our code, runtime and dependencies in a container. For custom runtimes we can use a provided image based on Amazon Linux.

Below is our Dockerfile:

This will install R, R libraries, copy the predict.R and runtime files into the container.

We use CMD [ “predict.handler” ] to tell Lambda to use the handler function in predict.R when invoked.

You do not need to build and push the container image to Amazon ECR. AWS CDK will do this automatically in step 3.

Testing your function locally

The Lambda Runtime Interface Emulator is a proxy for the Lambda runtime. It allows you to test your Lambda function locally using familiar tools such as cURL and the Docker CLI, to ease your development process. See the Emulator GitHub repository for instructions on how to use it.

AWS provided base images contain the runtime interface emulator so we don’t need to install it in our container.

Step 3: Deploy your inference service with CDK

The AWS CDK is a framework making it easy to define cloud infrastructure in code and provision it through AWS CloudFormation. You can follow this Workshop and the Examples if you are new to it. In our example, we use python to define our infrastructure.

You can find below the code to define our inference service infrastructure:

This stack will create IAM roles for Lambda and API Gateway. It will also automatically build and push the container image to ECR from assets in the lambda_image folder by using DockerImageCode.from_image_asset.

DockerImageFunction allows us to define the Lambda function. It will use the freshly pushed container image from ECR. This is also where we define the environment variable S3_MODEL_URI.

The last part of app.py creates an API with API Gateway V2. When testing, we will send POST requests with data to this API from a client. It will forward the requests to Lambda and return back the response to our client.

You can execute the following commands to install CDK and make sure you have the right dependencies to deploy the service:

npm install -g aws-cdk@1.85.0
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Once this is installed, you can execute the following commands to deploy the inference service into your account:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
AWS_REGION=$(aws configure get region)cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
cdk deploy --require-approval never

The first 2 commands will get your account ID and current AWS region using the AWS CLI on your computer.

We then use cdk bootstrap and cdk deploy to build the container image locally, push it to ECR, and deploy the inference service stack. This will take a few minutes to complete.

You can follow the deployment in CloudFormation

Testing your Lambda function

When your stack is created, you can navigate to the API Gateway service console and copy your API URL.

We use the $default stage here just for testing purposes. See Publishing REST APIs for customers to invoke for guidance on publishing an API.

The following is a test data point from the Boston Housing dataset you can add to your request body:

You can use tools like Postman to test the inference API from your computer: