Machine Learning with AWS Sagemaker

7 min readNov 21, 2019

Introduction

Nowadays, at some point of time, a data-centric application needs to look and evaluate options where they can host a machine learning models. There appear to be plenty of choices available, but you start to dig deep and in no time you will realize that this will take a lot more planning and careful consideration than what you thought originally .

Motivation behind using SageMaker

I had my share of struggle with deployment of Machine Learning models. I have deployed models on AzureML, using Flask framework in Python, even a few years back, have used Openscoring as an option. There have been a lot of talk when PMML came, and there is a lot of buzz around ONNX as well. In time, some of these will become mature and some of these will not, I am looking at some options which makes things relevant and applicable in today’s context.

AWS Sagemaker

In terms of what I am looking at, Sagemaker appears to be offering a lot of things which makes it a compelling choice. Not only productionizing models, It addresses various aspects to Machine Learning Lifecycle management, i.e, Training , Deployment and Hosting plus annotation as well.

Diversity

With Sagemaker, one can manage the lifecycle of almost all kinds of models trained in various frameworks such as Tensorflow, Pytorch, MXnet, Chainer, etc. It also supports models written in MXNet, TensorFlow, Chainer, PyTorch, scikit-learn,etc.

Offerings

At a higher level, Sagemaker offers four kinds of services, Ground Truth, Notebook, Training and Inference. In this article, we will focus on Notebook, Training and Inference part.

Notebook

Think of Notebook as a controller instance which manages training and setting up inference jobs. The remarkable thing here is that it comes up with various distributions and user gets a managed experience. So, no need to install packages and dependencies, all you have to do is to choose the distribution which you like to use an instance with requisite configuration is made available to you. Following are the distributions which are supported by Notebook instance.

Distributions supported by Sagemaker (Jupyter)

SageMaker Python SDK

SageMaker Python SDK provides the following important functions related to Machine Learning, which helps in carrying out various machine learning related tasks.

Estimators: Encapsulate training on SageMaker.
Models: Encapsulate built ML models.
Predictors: Provide real-time inference and transformation using Python data-types against a SageMaker endpoint.
Session: Provides a collection of methods for working with SageMaker resources.

In the next section, we will see how SageMaker API is used.

Using Sagemaker API for various tasks — Step by Step

For managing Machine learning related activities,Sagemaker API is used. This helps in data transfer, Running training jobs, deploying Sagemaker endpoints, testing Sagemaker endpoints and finally deleting them as well.

Following lines initialize the various variables which will be used later.

Upload Data

We use the sagemaker.Session.upload_data function to upload our datasets to an S3 location. The return value inputs identifies the location — we will use this later when we start the training job.

Training a Model

The following command does the training for you, but there is a lot happening behind the scenes for us. Let us try to understand this.

entry_point — This is the location of a python code (.py) file. This is the place where most of the work is done by the code writer in order to work with Sagemaker. This file should contain implementation of following functions.

instance_type — This is useful and helps to choose between various trade-offs, e.g, training time vs cost. You can choose a high end machine or one with smaller configuration based on your needs.

_train(args)

Above function does the training of the model. When sg_estimator.fit is invoked, _train function is called. Here, the programmer writes the code for training of the model, create a tar file which hosts the model and supporting information and saved to a S3 location. Model repository is accessible from sg_estimator.model_data This information is used when sagemaker endpoints are created.

Following is an example of output of training jobs. If you notice, there is a line which indicates the savings here. This will be explained later in the article.

Deploying a Model

We looked at _train function which is used for training a model. There are four functions which need to be implemented.

model_fn(model_dir): This loads the model from the model_dir and returns the reference model variable.

input_fn(request_body, content_type=JPEG_CONTENT_TYPE): Accepts the body of the request and sends back the input feature, such as image in case of image analytics.

predict_fn(input_object, model): Takes the input object and model and returns the prediction. This is the place where you need to do the customizations as well.

output_fn(prediction, accept=JSON_CONTENT_TYPE): This is the place where final processing is done before results are returned to the user.

Once above functions are implemented, then deployment calls can be made using Sagemaker API.

Please note that with sg_predictor, the deployed function can be tested.

Testing of deployment of model:

Testing to see if our model is providing the output as expected.

Loading the image — About to test the model with an image containing golf course.

Output

Above completes the full life cycle of model training and deployment.

Model Inference from Client

For this there is some additional work is required. This is not being covered here, but the required step is to have a Lambda layer and API gateway for making Sagemaker endpoint available to external world.

Some great features provided by Sagemaker

Spot Training — You probably have noticed that there is a parameter train_use_spot_instances. This helps in reduction of training costs. This can help in savings up to 90% but there is a catch, the end user need to implement checkpointing of model, in case the spot instance is temporarily reclaimed. In the training experiment illustrated above, there is a saving of about 75% is realized.
Managed Instance — Amazon SageMaker is a fully managed machine learning service. Data scientists and developers can quickly and easily build and train machine learning models, without the need to worry about configurations and managing servers. Directly deployment of models is also easy because sagemaker endpoint can finally leveraged as a microservice without writing code.
A standardized approach which can be used for all Deep Learning Frameworks- For people from software engineering background, this is reusability we always talk about and strive for. Sagemaker in a way, standardizes the process of training and implementation. You need to implement _train, input_fn, predict_fn and these functions are relevant when productionizing machine learning models in other service like Lambda.
Accelerated Inference: This is again a great feature. Not all use cases need to provide real time inference. And here, you can control what kind of response you need and choose the appropriate instance.

GPU Acceleration Available for Inference

Cost effective — You pay for the time you use. Need not to keep the instance up and running for training jobs. The controller instance can be of low config setup which is sufficient enough for writing code. For training, as you have seen, you can choose to invoke a high computing resource for that instance of training. Further reduction of cost is possible by using spot instances as explained above. One more thing, you are not bound with the instance you have selected. If you feel that you have selected a wrong instance, you have the flexibility to change this. No need to copy the code, destroy the instance and create a new instance.

Ground Truth

We have not talked about the “Ground Truth” service which is also a great offering which will done at a later point. This basically addresses the problem related to annotation of data which is used for supervised learning.

Final Thoughts

We see that SageMaker is a very good choice available today for managing machine learning lifecycle. The kind of flexibility it offers is awesome. How much time it takes to learn? Well if you are a Machine Learning practitioner, less than a week is good enough to get you going. A lot of examples are available which can be used as a starting point and modified to suit your needs. Overall, Sagemaker is definitely recommended for Machine learning practitioners.