Deploying ML model with AWS Lambda (Part 1)— SAM

4 min readJun 14, 2024

Deploying an ML model to AWS Lambda offers a scalable, cost-effective way to integrate intelligent data processing into your applications. AWS Lambda, a serverless computing service, allows you to run code in response to events without managing servers. By leveraging Lambda, you can deploy ML models that can handle tasks such as image recognition, natural language processing, and predictive analytics in a highly efficient and on-demand manner. This blog post will guide you through the steps to prepare your custom ML model, package it with necessary dependencies, and deploy it to AWS Lambda, enabling seamless and rapid inference capabilities for your applications.

Before we begin, The tools described below require an AWS user and CLI profile configuration. It is recommended to configure your profile with aws configure sso , as explained here.

AWS Serverless Application Model (SAM)

According to the official documentation,

The AWS Serverless Application Model (AWS SAM) is a toolkit that improves the developer experience of building and running serverless applications on AWS.

We will primarily use the SAM CLI tool, to initialize a new project from one of the templates, and override some of the key functionalities with our own.

Initialize a project with sam init

let's go ahead and initialize a basic machine-learning project

sam init : will open a template source menu:

Which template source would you like to use?
 1 — AWS Quick Start Templates
 2 — Custom Template Location

In this tutorial, we will choose option 1

2. Next, we chose a quick-start application template

Choose an AWS Quick Start application template
 1 — Hello World Example
 2 — Data processing
 3 — Hello World Example with Powertools for AWS Lambda
 4 — Multi-step workflow
 5 — Scheduled task
 6 — Standalone function
 7 — Serverless API
 8 — Infrastructure event management
 9 — Lambda Response Streaming
 10 — Serverless Connector Hello World Example
 11 — Multi-step workflow with Connectors
 12 — GraphQLApi Hello World Example
 13 — Full Stack
 14 — Lambda EFS example
 15 — DynamoDB Example
 16 — Machine Learning

option 16 provides some basic ML-related templates.

3. We will also use 3 for the latest Python runtime, in which two options are currently available (PS — python 3.9 offers more options, as well as PyTorch and TensorFlow templates, but it doesn’t matter for the sake of our example, the initialized model will be overridden regardless)

Which runtime would you like to use?
 1 — python3.9
 2 — python3.8
 3 — python3.12
 4 — python3.11
 5 — python3.10

4. finalizing the template — choosing either 1 or 2 is fine here, say we go for XGBoost . Next are some functionalities we will pass on, like X-Ray tracing, CloudWatch monitoring, etc. Lastly, choose any project name (Default sam-app).

Select your starter template
 1 — Scikit-learn Machine Learning Inference API
 2 — XGBoost Machine Learning Inference API

Project Structure

$ cd sam-app/
sam_app$ tree
.
├── app
│   ├── app.py
│   ├── Dockerfile
│   ├── __init__.py
│   ├── model
│   └── requirements.txt
├── events
│   └── event.json
├── __init__.py
├── README.md
├── samconfig.toml
└── template.yaml

The main logic is implemented in app.py , where a model is loaded, and the lambda_handler function, that dumps the model’s prediction to json, is implemented. Aside from that, a simple Dockerfile fetches a relevant lambda image from AWS ecr , copies app content and model weights, installs a requirements file and invokes the lambda_handler method.

sam-app$ cat app/app.py 
import base64
import json
import numpy as np
from io import BytesIO
from PIL import Image
import xgboost as xgb

model_file = '/opt/ml/model'
model = xgb.Booster()
model.load_model(model_file)

def lambda_handler(event, context):
    image_bytes = event['body'].encode('utf-8')
    image = Image.open(BytesIO(base64.b64decode(image_bytes))).convert(mode='L')
    image = image.resize((28, 28))
    x = np.array(image).reshape(1, -1)
    pred = int(np.argmax(model.predict(xgb.DMatrix(x))))
    return {'statusCode': 200,'body': json.dumps({"label": pred})}

In this specific case, xgboost is used to classify MNIST digits

Aside from that, a simple Dockerfile fetches a relevant lambda image from AWS ecr , copies app content and model weights, installs a requirements file and invokes the lambda_handler method.

/sam-app$ cat app/Dockerfile 
FROM public.ecr.aws/lambda/python:3.12

COPY app.py requirements.txt ./
COPY model /opt/ml/model

RUN python3.12 -m pip install -r requirements.txt -t .

CMD ["app.lambda_handler"]

Regardless, we are provided with an event.json file, that can be used to test our lambda with an example event. It contains many configurations like request header info and context info, but most crucially, the request body. Also, in our project, a template.yaml file is generated. This template file defines the structure of our Lambda. For example — It determines the timeout and memory size allocated, as well as the inference API endpoint value. It also contains the function “key” that is used when invoking the lambda. In our case, it InferenceFunction , right under Resources .

Build and invoke locally

Our project can now be built using sam build . This will create a .aws-sam directory and organize the application dependencies/files there for deployment. Let's test our lambda with a local invocation, using the local cli keyword:

sam-app$ sam local invoke InferenceFunction --event events/event.json

Namely — our inference function (essentially, the lambda_handler method) is being called with the event.json file.

Deploying

We’ll use sam deploy --guided to deploy our Lambda to the cloud. the --guided flag will kick off the deployment process with a series of prompts, like the Stack name (name of the stack we deploy), AWS region, etc…

Next up

In the next post, we’ll show how to override the default project with custom behavior, like loading models from hugging face, modifying the request body, and working with the deployed lambda.