Deploying ML model with AWS Lambda (Part 1)— SAM
Deploying an ML model to AWS Lambda offers a scalable, cost-effective way to integrate intelligent data processing into your applications. AWS Lambda, a serverless computing service, allows you to run code in response to events without managing servers. By leveraging Lambda, you can deploy ML models that can handle tasks such as image recognition, natural language processing, and predictive analytics in a highly efficient and on-demand manner. This blog post will guide you through the steps to prepare your custom ML model, package it with necessary dependencies, and deploy it to AWS Lambda, enabling seamless and rapid inference capabilities for your applications.
Before we begin, The tools described below require an AWS user and CLI profile configuration. It is recommended to configure your profile with aws configure sso
, as explained here.
AWS Serverless Application Model (SAM)
According to the official documentation,
The AWS Serverless Application Model (AWS SAM) is a toolkit that improves the developer experience of building and running serverless applications on AWS.
We will primarily use the SAM CLI tool, to initialize a new project from one of the templates, and override some of the key functionalities with our own.
Initialize a project with sam init
let's go ahead and initialize a basic machine-learning project
sam init
: will open a template source menu:
Which template source would you like to use?
1 — AWS Quick Start Templates
2 — Custom Template Location
In this tutorial, we will choose option 1
2. Next, we chose a quick-start application template
Choose an AWS Quick Start application template
1 — Hello World Example
2 — Data processing
3 — Hello World Example with Powertools for AWS Lambda
4 — Multi-step workflow
5 — Scheduled task
6 — Standalone function
7 — Serverless API
8 — Infrastructure event management
9 — Lambda Response Streaming
10 — Serverless Connector Hello World Example
11 — Multi-step workflow with Connectors
12 — GraphQLApi Hello World Example
13 — Full Stack
14 — Lambda EFS example
15 — DynamoDB Example
16 — Machine Learning
option 16
provides some basic ML-related templates.
3. We will also use 3
for the latest Python runtime, in which two options are currently available (PS — python 3.9
offers more options, as well as PyTorch
and TensorFlow
templates, but it doesn’t matter for the sake of our example, the initialized model will be overridden regardless)
Which runtime would you like to use?
1 — python3.9
2 — python3.8
3 — python3.12
4 — python3.11
5 — python3.10
4. finalizing the template — choosing either 1
or 2
is fine here, say we go for XGBoost
. Next are some functionalities we will pass on, like X-Ray tracing, CloudWatch monitoring, etc. Lastly, choose any project name (Default sam-app
).
Select your starter template
1 — Scikit-learn Machine Learning Inference API
2 — XGBoost Machine Learning Inference API
Project Structure
$ cd sam-app/
sam_app$ tree
.
├── app
│ ├── app.py
│ ├── Dockerfile
│ ├── __init__.py
│ ├── model
│ └── requirements.txt
├── events
│ └── event.json
├── __init__.py
├── README.md
├── samconfig.toml
└── template.yaml
The main logic is implemented in app.py
, where a model is loaded, and the lambda_handler
function, that dumps the model’s prediction to json
, is implemented. Aside from that, a simple Dockerfile
fetches a relevant lambda image from AWS ecr
, copies app content and model weights, installs a requirements file and invokes the lambda_handler
method.
sam-app$ cat app/app.py
import base64
import json
import numpy as np
from io import BytesIO
from PIL import Image
import xgboost as xgb
model_file = '/opt/ml/model'
model = xgb.Booster()
model.load_model(model_file)
def lambda_handler(event, context):
image_bytes = event['body'].encode('utf-8')
image = Image.open(BytesIO(base64.b64decode(image_bytes))).convert(mode='L')
image = image.resize((28, 28))
x = np.array(image).reshape(1, -1)
pred = int(np.argmax(model.predict(xgb.DMatrix(x))))
return {'statusCode': 200,'body': json.dumps({"label": pred})}
In this specific case, xgboost
is used to classify MNIST
digits
Aside from that, a simple Dockerfile
fetches a relevant lambda image from AWS ecr
, copies app content and model weights, installs a requirements file and invokes the lambda_handler
method.
/sam-app$ cat app/Dockerfile
FROM public.ecr.aws/lambda/python:3.12
COPY app.py requirements.txt ./
COPY model /opt/ml/model
RUN python3.12 -m pip install -r requirements.txt -t .
CMD ["app.lambda_handler"]
Regardless, we are provided with an event.json
file, that can be used to test our lambda with an example event. It contains many configurations like request header info and context info, but most crucially, the request body. Also, in our project, a template.yaml
file is generated. This template file defines the structure of our Lambda. For example — It determines the timeout and memory size allocated, as well as the inference API endpoint value. It also contains the function “key” that is used when invoking the lambda. In our case, it InferenceFunction
, right under Resources
.
Build and invoke locally
Our project can now be built using sam build
. This will create a .aws-sam
directory and organize the application dependencies/files there for deployment. Let's test our lambda with a local invocation, using the local
cli keyword:
sam-app$ sam local invoke InferenceFunction --event events/event.json
Namely — our inference function (essentially, the lambda_handler
method) is being called with the event.json
file.
Deploying
We’ll use sam deploy --guided
to deploy our Lambda to the cloud. the --guided
flag will kick off the deployment process with a series of prompts, like the Stack name (name of the stack we deploy), AWS region, etc…
Next up
In the next post, we’ll show how to override the default project with custom behavior, like loading models from hugging face, modifying the request body, and working with the deployed lambda.