How to deploy a Serverless Machine Learning Microservice with AWS Lambda, AWS API Gateway and scikit-learn

In this tutorial, we deploy a machine learning microservice using AWS Lambda, AWS API Gateway and scikit-learn. The accompanying code repository can be found on

Before you begin, make sure you are running Python 2.7 or Python 3.6 and you have a valid AWS account and your AWS credentials file is properly installed.

Step 1: Train a basic model

First, we train a 3-class gradient boosted decision tree logistic regression model on iris data set using the scikit-learn tutorial as a guide. Pickle the model as model.pkl.

Image for post

It doesn’t matter how good this model is for the purposes of this hack, it just needs to make predictions. Here’s my full model training and serialization script.

from sklearn import datasets
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.externals import joblib
# import data
iris = datasets.load_iris()
X =[:, :2] # we only take the first two features.
Y =
# init model
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=2, random_state=0)
# fit model, Y)
# save model
joblib.dump(clf, 'model.pkl')

Step 2: Upload your model to AWS S3

First we have to upload our model to a S3 bucket.

aws s3 cp model.pkl s3://yourbucketname

Step 3: Creating a Flask API

Let’s create a project directory

mkdir serverless-machine-learning
cd serverless-machine-learning

We wrap all our code and dependencies into a virtual environment.

pip install virtualenv
virtualenv your_virtual_environment_name
source your_virtual_environment_name/bin/activate

Now, we’re ready to build our API. Create a directory api with a file called and write this into it:

from flask import Flask, request, json
import boto3
import pickle
BUCKET_NAME = 'serverless-machine-learning'
MODEL_FILE_NAME = 'model.pkl'
app = Flask(__name__)S3 = boto3.client('s3', region_name='eu-central-1')@app.route('/', methods=['POST'])
def index():
# Parse request body for model input
body_dict = request.get_json(silent=True)
data = body_dict['data']

# Load model
model = load_model(MODEL_FILE_NAME)
# Make prediction
prediction = model.predict(data).tolist()
# Respond with prediction result
result = {'prediction': prediction}

return json.dumps(result)
if __name__ == '__main__':
# listen on all IPs'')

The code is basically self-explanatory. We make a Flask object, use the ‘route’ decorator functions to define our paths, and call a run function when we run it locally (which you can confirm by calling python api/ and visiting localhost:5000 in your browser.)

To load the model from S3 we use the following helper function:

def load_model(key):    
# Load model from S3 bucket
response = S3.get_object(Bucket=BUCKET_NAME, Key=key)
# Load pickle model
model_str = response['Body'].read()
model = pickle.loads(model_str)

return model

Note: The code in the Github repository uses a memoized annotation to cache the model file after it is pulled from S3, eliminating any need for additional S3 data transfer and leading to significantly faster prediction.

Step 4: Configure AWS Lambda & API Gateway

We use a framework called Zappa to create and configure both AWS Lambda and the API Gateway automatically. Think of it as “serverless” web hosting for your Python apps.

That means infinite scaling, zero downtime, zero maintenance — and at a fraction of the cost of your current deployments!

So let’s start: First, we install the required packages into our virtual environment.

pip install zappa sklearn numpy scipy

Next, we initialize Zappa.

zappa init

Zappa has automatically created the a zappa_settings.json configuration file:

{ “dev”: 
{ “app_function”: “”,
“aws_region”: “eu-central-1”,
“profile_name”: “default”,
“project_name”: “serverless-machine-learning”,
“runtime”: “python3.6”,
“slim_handler”: true,
"s3_bucket": "zappa-10z1mxwy2"

This defines an environment called ‘dev’ (later, you may want to add ‘staging’ and ‘production’ environments as well), defines the name of the S3 bucket we’ll be deploying to, and points Zappa to a WSGI-compatible function, in this case, our Flask app object.

By setting the configuration parameter slim_handler to true allows Zappa to load code from Amazon S3 in case our environment exceeds the maximum size of 50 MB.

Step 5: Testing the API locally

The API can be tested locally like a regular Flask application

First, run the Flask app as usual:

$ python api/

Second, make a test API call

$ http POST localhost:5000 < paload.json# payload.json
"data":[[6.2, 3.4], [6.2, 1]]

The response should be

{"prediction": [2, 1]}

Step 6: Deploying to AWS Lambda

Now, we’re ready to deploy to AWS. It’s as simple as:

$ zappa deploy dev

And our serverless Machine Learning microservice is alive!

$ http POST < payload.json

Congratulations, you have finished all required streps to deploy a serverless machine learning microservice. I hope you enjoyed the project.

Github repository:

If you run into issues getting the application working, feel free to DM me.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store