ML in Production with SageMaker — Part 1

Published in

MLOps.community

6 min readOct 12, 2020

Hello. In this series we are discussing how to get an ML project to production in a very fast, scalable and secure way with AWS SageMaker. Our roadmap will be as follows:

Part 1 — SageMaker Processing, Training and Deploy (you are here)
Part 2 — Scalable SageMaker Endpoints and Batch Transform Jobs
Part 3 — Orchestrating ML Pipelines with SageMaker, Lambda and StepFunctions
Part 4 — Orchestrating ML Pipelines with AirFlow and SageMaker

Requirements: An AWS account with permissions to S3 and SageMaker.

Important notes: if you intend to implement the content of this article in a test environment, be aware that you will spend some USD. Also, our intention is not to show a fully replicable project, but to give you hints and code snippets to accelerate things up — although we provide a data set for replication! :) Let’s get to it!

In the AWS Console, create a new Notebook instance. We will start all AWS services from there using boto3 and sagemaker SDKs.

SageMaker Notebook instances UI. Image by the author

It is also important to choose wisely your working region so you can set up all your infrastructure within it. AWS charges you when data travels from one region to another. So, if your S3 buckets are on the same region the computation is going to happen, we don’t have to pay for data traffic.

SageMaker works basically with the concept of data repository. All data (input and output) and model artifacts go to a Data Lake structure set up on S3. SageMaker is very efficient working with S3 buckets; you just tell it where your data is, where your training code is, where should it place model artifacts and transformed data and it does the heavy lifting for you.

1. Setup

Here we are setting up the python modules we need, the IAM role for SageMaker to access S3, and the session auth. In this scenario, you already have an S3 bucket with your raw dataset to be processed. Remember that for most SageMaker built-in algorithms, target column must be the first one on training and evaluation sets. We also define keys for train, eval and test processed sets and the processing.py code.

If you want to replicate this post, the code below will prepare a subset of 2012 Brazilian National Househould Survey (PNAD) and upload it to S3.

2. Processing Job

Now, let’s define our processing routine. We will write a processing.py script which will be called by SageMaker at runtime. Save the file in the same folder you are running Jupyter notebook. SageMaker already has a SKLearnProcessor class which allows us to implement data processing with scikit-learn package as we are very familiar to. At runtime, we will also install pandas to give us a hand.

Note: although you can setup a really big machine to do data processing at AWS, if you have a really monstrous data set, you should implement a PySparkProcessor or a Glue Job.

The main idea of this processing script is to get the raw data, do your data cleansing and transformation routine, separate train, eval and test sets and save them into CSV files in the proper folders. We will tell SageMaker that these folders contain the processed data and it is going to get them straight to S3.

Now, back to the Jupyter Notebook, we will upload the processing script to S3, define and run a processing job. To do that, we have to tell what are the inputs (raw_data key) and outputs (train and test folders within the container). SageMaker will automatically get the content of the ProcessingInput key to the container destination folder that we declare. The same will happen with the outputs.

Very important note: in a production environment, you may do a lot of processing jobs, many times a day. If that is the case, you should “partition” your destination S3 keys such as /processed/train_data/<year>/<month>/<day>/<hour>/<minute>/<second>/ as this structure makes querying tools (such as AWS Athena) more efficient. The idea is to make the partitioning schema very specific so the querying tool may scan less data (and you pay less for that!).

3. Training Job

Once the job is done we can now proceed with training the model. As an example, we’re going to use XGBoost built-in algorithm.

First, we set the container image for xgboost and define the Estimator. Here we can choose the instance type (check the available instance sizes), the output path for model artifacts on S3, declare that we want to use SPOT instances and set the timeout for the job and timeout for AWS to find a SPOT instance (when using spot instances in Estimators, you have to declare train_max_run and train_max_wait and the second has to be longer than the first). Then, we set some model hyperparameters, set the location and data type for training inputs and GO!

Notice that we are using SPOT Instances, which cost about 1/4 of regular On Demand instances. SPOT instances are great for heavy and not critical computations. They are basically compute capacity that is not being used in AWS cloud so they make it available for us for a very attractive price. AWS can request a spot instance back at any time and they give us a 2 minute warning for that. If you have critical jobs, better to use On Demand instances. If you will have no problems starting again your training if it is interrupted by a spot request, then, have a go!

When I was executing this code, SageMaker tells us in the model output how many billable seconds we will pay and how much money we saved by using SPOT instances. You should show this to your Team Leader. She’s gonna love it!

4. Deployment

Now that we have a trained model, we are about to start a long and tiring deployment project, right? WRONG! To deploy a trained model as a service in production with SageMaker, do

Yes, that is it! Simple as that. You now have a model endpoint listening for prediction requests. We are going to test this sending some data to make predictions.

Here, we are making just one prediction and simulating 10,000 concurrent predictions.

Conclusion and Clean up

Congratulations! You have deployed an ML Model in production!

VERY IMPORTANT INDEED: If you are replicating this post for testing purposes, DO NOT FORGET TO CLEAN UP. Delete your model endpoint with

xgb_predictor.delete_endpoint()

and STOP YOUR NOTEBOOK INSTANCE! If you let services on while you are not using them, you’re basically throwing money to the trash. Don’t do that!

In the next post, we are going to talk about scalable endpoint and Batch Transform Jobs for the cases where you don’t really need an online model, just batch predictions in a regular schedule.

And here is the whole notebook:

About me

I am a Data Scientist focused on MLOps and solutions architecting for Machine Learning. I am also very passionate about helping data scientists and analytics teams to achieve their best performances. One of the coolest projects I have been able to contribute is Hermione, a tool to help Data Science teams reduce time to production. Check it out!