Deploying a Multi-Model Inference Service With AWS Lambda, Synchronous Express Workflows, Amazon API Gateway, and CDK

Sofian Hamiti
Feb 16 · 7 min read

I have recently published a post explaining core concepts on how to deploy an ML model in a serverless inference service using AWS Lambda, Amazon API Gateway, and the AWS CDK.

For some use cases you and your ML team may need to implement a more complex inference workflow where predictions come from multiple models and are orchestrated with a DAG. On AWS, Step Functions Synchronous Express Workflows allow you to easily build that orchestration layer for your real-time inference services.

Image for post
Image for post
Photo by Jon Tyson on Unsplash

In this post, I will show how you can create a multi-model serverless inference service with AWS Lambda, Step Functions Synchronous Express Workflows, and Amazon API Gateway. For illustrative purposes, our service will serve house price predictions using 3 models trained on the Boston Housing Dataset with Scikit-Learn. We will use the AWS CDK to deploy the inference stack into an AWS account.

Visiting the Lambda tutorials, Getting Started with API Gateway, Create a Serverless Workflow, and the CDK workshop could be a good start if those things sound new to you.

Walkthrough overview

We will deploy the inference service in 3 steps:

Below is the architecture overview for our multi-model inference service:

Image for post
Image for post
Architecture overview of the multi-model inference service


To go through this example, make sure you have the following:

git clone

Step 1: Creating the inference Lambda Functions

Train the models and upload the binaries to S3

The first ingredients you need in your ML inference service are the models themselves. Here we train a Linear Regression, a Random Forest, and a Support Vector model. For your convenience I have prepared 3 notebooks in the train_models folder and you can run them in your environment to create the models.

Below is the train_random_forest notebook as an example:

Each notebook uses the Boston Housing dataset from scikit-learn, splits it into train/test sets, trains a model with the train set, and saves the binary to your local folder as a .pkl file.

You will need to upload the files into an S3 bucket of your choice like shown below:

Image for post
Image for post

The model S3 URIs will be used in the Lambda functions for inference.

Create the inference Lambda handlers

An AWS Lambda function handler is a method that will process requests from the State Machine. When invoked, Lambda will run the handler containing our inference logic and return a response with model predictions.

In our case we have 3 handlers for the 3 models and you can find the code in the lambda_images folder as files.

The following file contains the inference handler for the Random Forest model:

This script will be copied to the Lambda container image at build time. You can also copy the model binary into the container image, depending on how tightly coupled you need the inference code and model to be.

In our case, we use boto3 to copy the model from S3 into the /tmp folder. The BUCKET and KEY variables contain the S3 bucket and key of the model and will be declared in step 3 as Lambda environment variables. Implementing it this way allows you to swap between different model versions without changing the container images themselves.

The rest of the code in reads data from the request event, predicts the House Price using the loaded model and returns a response with predictions.

The first execution of your lambda functions might take up to a few seconds to finish depending on the size of your model and environment complexity. This is called a cold-start and you can visit the following blog if you need to reduce it: Predictable start-up times with Provisioned Concurrency

Package your inference handlers in containers

We can now package our code, runtime and dependencies in containers. We use a provided image compatible with Python and based on Amazon Linux.

You can find the 3 Dockerfiles for our Lambda functions in the lambda_images folder.

Below is the Dockerfile for the Random Forest model:

We use CMD [ “predict.handler” ] to tell Lambda to use the handler function in when invoked.

You do not need to build and push the container image to Amazon ECR. AWS CDK will do this automatically in step 3.

Deploy your Lambda functions with a CDK nested stack

We will use CDK and nested stacks in step 3 to deploy the components of our service. You can find in the stacks folder. Breaking down the stack will allow you to add new models and update the DAG with minimal effort.

Step 2: Using a State Machine Express workflow to orchestrate the inference

AWS Step Functions Express Workflows use fast, in-memory processing for high-event-rate workloads of up to 100,000 events per second. They can also be invoked in response to HTTP requests via API Gateway, and support synchronous requests. See New Synchronous Express Workflows for AWS Step Functions for more details.

They are ideal for high-volume, event-processing workloads requiring low latency, and allow us to build logic in our real-time inference service.

Orchestrate your inference workflow with State Machines

In our example service we want to send data to the API, and receive a combined list of predictions from the 3 models. We will use the Parallel state from Step Functions to concurrently call the 3 inference Lambda functions before sending back the response to the client.

You can find the State Machine nested stack in, and the API Gateway stack in

Below is the visualization of our inference graph and feel free to adjust the workflow based on your needs:

Image for post
Image for post
Multi-model prediction workflow with scatter-gather pattern

Step 3: Deploying your inference service with CDK

We will deploy our inference service in the same way as we did in my previous post.

You can find below the main CDK stack to define our inference service infrastructure:

This stack will create the Lambda functions, the State Machine, the API Gateway, and their respective IAM roles.

When testing the API, we will send POST requests with data from a client. It will forward the requests to the State Machine orchestrating the Lambda functions, and return back the response to our client.

You can execute the following commands to install CDK and make sure you have the right dependencies to deploy the service:

npm install -g aws-cdk@1.89.0
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Once this is installed, you can execute the following commands to deploy the inference service into your account:

ACCOUNT_ID=$(aws sts get-caller-identity --query Account | tr -d '"')
AWS_REGION=$(aws configure get region)
cdk bootstrap aws://${ACCOUNT_ID}/${AWS_REGION}
cdk deploy --require-approval never

The first 2 commands will get your account ID and current AWS region using the AWS CLI on your computer.

We then use cdk bootstrap and cdk deploy to build the container images locally, push them to ECR, and deploy the inference service stack. This will take a few minutes to complete.

Image for post
Image for post
You can follow the deployment in CloudFormation

Testing your API

When your stack is created, you can navigate to the API Gateway service console and copy your API URL.

Image for post
Image for post

We use the $default stage here just for testing purposes. See Publishing REST APIs for customers to invoke for guidance on publishing an API.

The following is a test data point from the Boston Housing dataset you can add to your request body:

You can use tools like Postman to test the inference API from your computer:

Image for post
Image for post

The Startup

Medium's largest active publication, followed by +771K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store