AWS Sagemaker Introduction & Tutorial

Published in

Data Science Student Society @ UC San Diego

5 min readApr 6, 2020

Hello, this is Han from the Data Science Student Society at UCSD! Today I am going to introduce AWS Sagemaker as a cloud service running customizable Machine Learning model for users and how to use it.

Image Credit: https://towardsdatascience.com/why-do-we-need-aws-sagemaker-79bce465f19f

Introduction to AWS SageMaker.

“Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.” Introduction quoted from the website of AWS SageMaker.

I bolded the advantages of AWS Sagemaker in the introduction above. In order to train a successful machine learning model, you need to have a sufficient amount of data, powerful computational resources, and a user-friendly script to tune the model parameters. The first two requirements can’t be fulfilled by a lightweight laptop or normal personal PC we use every day; however, AWS Sagemaker provides these three utilities at once on the cloud platform with the help from other Amazon web services.

Four nice advantages for AWS Sagemaker:

Big storage space to store datasets, provided by AWS S3 bucket.
Powerful computational resources, provided by AWS EC2 instance.
End-to-End machine learning model development, even on a raspberry pi camera, provided by AWS Sagemaker.
Extensible on any type of machine with a network, provided by AWS Lambda.

AWS Sagemaker features maps and functionalities provided for each stage of machine learning development. Image Credit: https://aws.amazon.com/cn/sagemaker/

How does Amazon Sagemaker work

There are a bunch of amazing features that AWS Sagemaker provides. For example, AWS also provides machines for training and a nice pipeline to tune model hyperparameters. Here I would mainly focus on deploying a model on the AWS Sagemaker, since training a model can happen anywhere on any machine you like. And I believe that the most powerful feature of AWS is to offer cloud services to any device.

Image Credit: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-hosting.html

To deploy a machine learning model on AWS, you need to have three things: trained model files in S3 bucket, an AWS machine instance, and a script to invoke this model. When you start a machine learning service online, what happens is that a virtual machine on AWS loads the script you prepared. The main function of your script would load the model artifacts into the virtual memory from the S3 bucket previously hard-coded.

Then the AWS would prepare an endpoint to be called on your device. Another way to think about this is that this endpoint serves as the communication channel between your device and your virtual machine on the cloud. So when you call this endpoint with some data, those data would be transferred by the endpoint to the stack of your virtual machine. After processing these data, your model would respond in JSON format through the endpoint back to your devices.

How to use Amazon Sagemaker

Now, let’s get started.

What you need:

A trained model uploaded to your S3 bucket.

For the PyTorch model, to better follow along with this tutorial you should have it in .pth file. Of course, you can have it in other formats. But the idea is the same.

torch.save(model.state_dict(), PATH) ## the PATH should end with .pth

2. For the TensorFlow model, your model file should be in tar.gz format. Here is a link regarding how to deploy a TensorFlow model on AWS. Basically, for the TensorFlow model, you need to export that model into multiple files and then you compress all those files into one tar.gz file.

3. Check out other machine learning models.

A python script with model_fn

I will use the code for a Convolutional Neural Network for MNIST dataset as an example. Note that model_fn() function is necessary because Sagemaker will look for this function to load the PyTorch model. There are also other functions important, such as input_fn() to do data preprocessing and output_fn() for output processing.

More information here.

import argparse
import json
import logging
import os
import sagemaker_containers
import sys
import torch
import torch.distributed as dist
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
import torch.utils.data.distributed
from torchvision import datasets, transforms# Based on https://github.com/pytorch/examples/blob/master/mnist/main.py
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)def save_model(model, model_dir):
    logger.info("Saving the model.")
    path = os.path.join(model_dir, 'model.pth')
    # recommended way from http://pytorch.org/docs/master/notes/serialization.html
    torch.save(model.cpu().state_dict(), path)def model_fn(model_dir):
    # This function serves to load the model files the way you want it todevice = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    
    # Next step is to initialize a modelmodel = torch.nn.DataParallel(Net())
    with open(os.path.join(model_dir, 'model.pth'), 'rb') as f:
        model.load_state_dict(torch.load(f))
    return model.to(device)

Create a notebook instance on AWS Sagemaker

This step should be pretty straightforward from the instructions on AWS Sagemaker. You need to create your account on AWS, create IAM roles, choose a machine instance type, and start a notebook. Note that once you start your notebook, AWS will charge you based on the time when the notebook is open regardless of whether you are doing work. So it is in your best interest to make sure your notebook is shut down after you finish.

Upload your python script

Just upload your python script to the directory where your notebook is. In short, click the upload button on the notebook instance.

Deploy your model estimator

import boto3, re
from sagemaker import get_execution_rolerole = get_execution_role()import sagemakersagemaker_session = sagemaker.Session()from sagemaker.pytorch import PyTorch
pytorch_model = PyTorchModel(model_data='s3://path/to/your/trained/model/files', role='SageMakerRole',entry_point='transform_script.py')# Here you should be able to see an endpoint number output. This endpoint number can be used anywhere you like to start a communication with your model.predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)# `data` is a NumPy array or a Python list.
# `response` is a NumPy array.# Note that the data can't be more than 5MB to avoid time out error. 
response = predictor.predict(data)# Don't forget to delete the endpoint after you finished! 
sagemaker.Session().delete_endpoint(predictor.endpoint)