How to build, train and deploy your own ML algorithms on AWS SageMaker from scratch

Mert Oz
Testinium Tech
Published in
13 min readJun 9, 2021

Hi all! Welcome to my Medium post. During the pandemic, we all should stay home as much as possible, keep our social distance, and blah blah to combat the virus. We spend our most of time on screens. We watch Netflix, start a hobby, learn new stuff, etc. Sometimes it is hard to find to do something when we are home. In one of these times, I asked myself why I don’t write an article about what I learned for running my machine learning algorithms on Amazon SageMaker 😊. As you see I have completed writing it and made it ready for reading.

The purpose of this post is to give you an understanding of how you can use SageMaker to build, train and deploy your machine learning models. By the end of this article, you’ll be able to integrate popular Amazon services such as API Gateway, Lambda, ECR, S3, and EC2 with SageMaker to create a machine learning workflow.

You can access all things you need throughout this article at the repository: https://github.com/osemrt/AWS-SageMaker

The main topics that will be covered in the post are:

  • What Amazon SageMaker is and How it works
  • How to create a ML pipeline
  • How to prepare a model in SageMaker
  • How to use docker containers to train and deploy

SageMaker

Let’s take a quick dive into the SageMaker service to get a sense of what you can do. As you probably know, the workflow of machine learning, especially in large projects, involves complex steps the way from data preprocessing to deploying the model. The typical phases include data collection, data pre-processing, building datasets, model training and refinement, evaluation, and deployment to production. All these steps bring along their own set of challenges. Therefore these pipelines should be carefully designed and handled. The Amazon SageMaker service removes the heavy lifting from each of the steps in the workflow to make it easier to develop high-quality models.

It gives you complete access, control, and visibility into each of the steps. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place. What you hear now about SageMaker, it’s just the tip of the iceberg. It also offers features such as SageMaker Marketplace where you buy and sell Amazon SageMaker algorithms, Automated Machine Learning (AutoML) that provides a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm, and tuning the hyperparameters in your pipeline, and many more.

For more information about SageMaker, visit the following link:

Model and Dataset

Since this article is not about machine learning, we will use a basic learner model and dataset. I won’t talk about the best model for our dataset instead we will stick to the building, training, and deploying on SageMaker. However, everyone is encouraged to follow the same steps on their own model and dataset. Throughout this article, Logistic Regression on Iris Flowers Dataset for Classification is used. This dataset is one of the famous multivariate data sets and publicly available. The data set consists of 50 samples from each of three species of Iris Flowers — Iris Setosa, Iris Virginica, and Iris Versicolor. Four features are available for each sample: the length and the width of the sepals and petals, in centimeters. Using these four features we will be able to get predictions from our model about the species at the end of the post.

Create training script

We need to have a python training script where we define the model and its behaviors. This file should contain all you need to train your own model. The training data for your model is uploaded by SageMaker into the container from the S3 path you specify when you start a training job. You can give any parameters that you need to train the model as well. Parameters are passed into the script in JSON format and given as a parameter of code that starts the training job. This script is run whenever we submit a new training job to SageMaker. In our project, this python file includes the following codes:

Container Folder Structure

When you run a model training job, SageMaker creates a specific folder structure under the /opt/ml directory inside of your training container:

/opt/ml
├── input
│ ├── config
│ │ ├── hyperparameters.json
│ │ └── resourceConfig.json
│ └── data
│ └── <channel_name>
│ └── <input data>
├── model
│ └── <model files>
└── output
└── failure

SageMaker first downloads the training data from the Amazon S3 location that you specify into the container. The training scripts then produce a model artifact in /opt/ml/model. Whatever you have inside of this folder is then packaged into a compressed tar archive and pushed to an Amazon S3 location by Amazon SageMaker when the training job is done. The some of important paths are:

  • /opt/ml/data/<channel_name>/<input data> where your data is stored
  • /opt/ml/model where the model artifacts are inside

For more information, visit:

As I said SageMaker creates the folder structure above inside of the training container, for the inference container, we have the following folder structure:

/opt/program 
├── nginx.conf
├── wsgi.py
├── train
├── serve
└── predictor.py

The files are in the container are:

  • nginx.conf where configuration file for the nginx front-end is stored
  • wsgi.py where we have a wrapper to use the Flask app
  • train where we store the logic to run the container for training
  • server where we start the program for hosting
  • predictor.py where the endpoints for model are stored

You can make changes in these folders for your specific system configuration. However, in this post, the only file that I would like to show you is predictor.py. This file includes two endpoints: /ping and /invocations.

/ping receives GET requests and returns 200 if the container is up and accepting requests. SageMaker uses this endpoint for debugging purposes.

/invocations is where we receive the client inferences to make predictions from the training model. When this endpoint receives data, it converts it into a proper format and gets predictions from the model using model artifacts that were uploaded to S3 at the end of training and downloaded under /opt/ml/model by SageMaker.

If you use a complex model and need to change the model input format, feel free to modify it to the one that fits your needs. The codes in predictor.py are:

Workflow

With Sagemaker, you have the option to either use one of the built-in machine learning algorithms from the SageMaker marketplace mentioned earlier or create your own machine learning algorithms. In this article, we will create our custom machine learning algorithm and then train and deploy it on Amazon. Firstly I will give a broad overview of the system that we will build today and show you how you can build it yourself step by step.

As you can see, users interact with the system in two ways. To be able to get predictions from their models, they first should load the data to an S3 bucket, and start a training job on SageMaker from anywhere they want. It could be a Jupyter notebook where you start the job using boto3 (Python SageMaker SDK) or a Java spring project as in our case. The next procedures are, in summary, creating a model endpoint (Step 6) and giving the endpoint URL to Lambda. When you call the endpoint in AWS Lambda, users are ready to pass their data to receive inference using API Gateway.

Before moving forward, clone the project repository using the following command.

git clone https://github.com/osemrt/AWS-SageMaker

Now we are ready to implement the workflow above step-by-step. The numbers before the titles specify where we are in the workflow. I am excited to start it, how about you? 🙂. Without further ado, let’s jump into implementation details.

1. Upload the training data to Amazon S3 Bucket

The first step in the workflow is to upload the training data to a S3 bucket. If you don’t know how to create a bucket, visit the following link:

After creating the bucket, go to the project GitHub repository, download the dataset folder and upload it into the bucket. Your dataset should be inside the bucket as the following bucket path.

2. Push the docker images to Amazon Elastic Container Registry (Amazon ECR)

SageMaker uses Docker containers for runtime tasks. Containers are used to train and deploy machine learning algorithms. Using containers we are able to deploy quickly at any scale. In this step, I will show how you can push your inference and training docker images to ECR (Elastic Container Registry). First, we should have an access key and secret key pairs to use the AWS Command Line Interface. To create your access keys, go to the following link:

When you have your own access and secret keys, type aws configure command in terminal and enter your keys, default region, etc. In my case, I have the output below.

Now, we need two repositories in ECR to push the training and inference images. To create the repositories, go to the link below.

We’ve done! you should see your repositories like below.

Before pushing our images, we should say where Docker uploads your images when you execute docker push command. First, type aws ecr get-login — no-include-email command to get your temporary authentication code. Your output will be like that:

Copy the command result and paste it into the same terminal. You will be able to see the login succeeded message. Once you have logged in, you are ready to push your Docker images into ECR.

We will go to the inference and training folders under the container folder, then build images and push them. In each of these folders contains a Dockerfile where we define our base images and which commands should be executed when the containers start running.

Let’s start with the training image. Inside the training folder, run the following command with the repository and tag name. You can give any tag name you want when you build images. In my case, I’ve preferred to give them the latest tag.

docker build — tag <repository_name>:<tag_name> .

Execute the same command in the inference folder.

You can see the created images when you exeute docker image ls command.

Now, it is time to push the inference and training images into ECR. Execute the following command for the training and inference images:

docker push <repository_name>:<tag_name>

Be patient, uploading your images to ECR could take time. When it is completed, you will be seeing the images in the repositories.

3. Train the model

There are many ways of starting a training job in SageMaker. You can start it from a python script, Jupyter Notebook, Lambda trigger, etc. however, in this post, we will do it from a Java Sprint project.

Go to the service folder in the Github repository to access the project codes. Make sure you change the following configurations in the application.yml file with your own and install dependencies in pom.xml. As the names suggested, the keys without comments are clear what you will type but I have commented out for others.

amazon:
accessKey:
secretKey:
region:
sagemaker:
training:
s3Uri: s3://amazon-sagemaker-s3bucket/dataset/
trainingJobName: myTrainingJob
roleArn: # your SageMaker execution role arn
s3OutputPath: s3://amazon-sagemaker-s3bucket/model/
channelName: training
trainingImage: <account_id>.dkr.ecr.us-east-2.amazonaws.com/training:latest

Once the project is running and up, send a post request to start training.

Click Training jobs from the left panel in SageMaker to see the status of your training job and wait for it to be completed.

When it is completed, SageMaker automatically uploads the files under /opt/ml/model after the zipped them.

4. Create Model

Go to Models under Inference in the left panel and then click the “Create Model” button. Give a model name, specify an IAM role and choose the “Provide model artifacts and inference image location” option.

In this step, we specify the inference container image URI and the S3 path of the model artifacts. When we give them to create a SageMaker model, the files inside of the path are downloaded under the folder /opt/model in the inference container. The server script will be running to serve the endpoints inside predicator.py.

You will be seeing the model with its name after the it is created.

5. Create Endpoint Configuration

After you’ve created the model, create an Endpoint Configuration and specify the s3 path of your model. It will let you specify which model will be added to the endpoint and what AWS instance to run it on. It is up to you how many instances and what instance types you will use. In this post, I will use the default configurations.

6. Create Endpoint

Before serving the model endpoints, the final step includes creating a SageMaker Endpoint. The endpoints that you obtain in this step will be called by AWS Lambda. Give an endpoint name, choose the “Use an existing endpoint configuration” option and then select the model that you’ve just created.

The endpoint name will be used in Lambda to send requests, therefore copy the name for now.

7. Use Lambda Function to make predictions

We have made the SageMaker endpoint ready, it is time to create a Lambda function to invoke it. To create the lambda function, first, give a function name, choose the “Python 3.6” runtime, and click the “Create Function” button.

Give the endpoint name you defined in the previous step to your Lambda function as an environment variable.

Then paste the following python script into your Lambda function and click the “Deploy” button.

Check if the Lambda function works by running a test case like below.

If you did everything right, you will be seeing your prediction result corresponding to the data you have. For my feature array [2, 42, 1, 3], the model says that its species is setosa.

8. Call the Lambda function from API Gateway

This is the last step in the workflow. In this step, we will create an API and make it public for everyone. To do that, go to the Amazon API Gateway service, create API, choose the following options, and give an API name.

Click the Actions button and choose “Create Resource”, give a resource name, and click “Create Resource”.

Under the resource, you just created, create a POST Method by clicking the same dropbox. And choose Lambda Function, region, and type your Lambda function name.

Click okay in the popup screen that lets you know that the API gateway will obtain permission to call the Lambda function. And test your API Gateway. Give the feature array in a proper JSON format to get the prediction.

If everything is okay for you, let’s deploy the API. Click “Deploy API” under the window that opens when you click “Actions”. Give a stage name and that’s it. You now deployed the API.

Go to the “Stages” section from the left table, click your API name and copy your invoke URL to use your API.

Test your API

Now we have successfully trained and deployed our own custom model on SageMaker. You are now ready to test your Gateway API anywhere you want using the invoke URL. When I send a POST request to the invoke URL with the data, I am able to get a prediction. If you do so, Congratulations!! you’ve completed the steps and created your ML pipeline on SageMaker :).

References

--

--