Custom Keras model in Sagemaker

Richard Chen
5 min readMay 29, 2018

--

As a data scientist and someone who recently started learning about neural network, it’s a question for me as to how to get something trained and get up and running with the least amount of effort. Specifically, the dilemma was that I already have working code locally. How do I get it up and running without changing too much of the existing code?

That’s when I discovered AWS Sagemaker.

Before I get started, I wanted to refer and give a big thanks to the following post as that was how I got started:

What is AWS Sagemaker?

From AWS Sagemaker website’s excerpt:

Amazon SageMaker is a fully managed machine learning service

What that entails is that a data scientist can work within AWS Sagemaker’s notebook (exactly like Jupyter Notebook) and basically handle everything from training the model, to deploying the model, to setup endpoints so that, say, your app can call Sagemaker and make a prediction.

It is a neat service with a plethora of helpful examples: https://github.com/awslabs/amazon-sagemaker-examples

Pictorially, this is the representation:

You can do all of that in the Jupyter Notebook on Sagemaker if you so choose.

As you will find in the examples, AWS Sagemaker has a lot of built-in models that you can use that can simplify the development even further. (https://docs.aws.amazon.com/sagemaker/latest/dg/algos.html)

However, if you already have a Keras model like I did, and want to get it up and running without changing too much of the code, then you’ll have to consider having a Dockerfile to aid your custom deployment.

Step by Step to deploy your own code

You can set up your repo structure differently but I copied the structure off of the following repo: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own

In this example, all you’ll need to change are the following files:

  • predictor.py
  • train
  1. Building and registering container

In order to deploy custom code, we’ll first have to register the container and sent in the instruction as to what this container should look like. This is where you’ll need to provide your own Dockerfile.

A Dockerfile is a text document that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession.

Basically, it provides a container that your code can compile and run. Much like a VM.

For reference, here is my Dockerfile: https://github.com/rca241231/sagemaker_example/blob/master/container/Dockerfile

In your Jupyter Notebook, all you need is this section of the code in order to set it up:

!cat container/Dockerfile%%sh

# The name of our algorithm
algorithm_name=sample_algo

cd container

chmod +x sample/train
chmod +x sample/serve

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.

aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
$(aws ecr get-login --region ${region} --no-include-email)

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

As you can see, we attached the Dockerfile and then prescribed the shell instructions as to how to build and push to ECR.

2. Train and Model building

After you successfully setup your container, you’ll need to have a training file so to provide the logic on how to train and build your model. Typically, this is the code that you have locally.

One caveat here is that in this case, since you will need to save the package and load somewhere else (ie. in Sagemaker), you’ll need to save your model as such:

def build_model():
...
return model
optimized_classifier = build_model()
optimized_classifier.best_estimator_.model.save(
os.path.join(model_path, 'example_model.h5'))

One thing to note here is that in the guide, the author saved the file as .pkl. However, for models built with Keras, you have to save it as .h5 instead.

For reference, you can see my train file: https://github.com/rca241231/sagemaker_example/blob/master/container/ann/train

Once you are done creating the train file, it is time to instruct Sagemaker on how to create and train a model. This is where you’ll need to create an estimator as such:

WORK_DIRECTORY = 'data'

data_location = sess.upload_data(WORK_DIRECTORY, key_prefix=prefix)
account = sess.boto_session.client('sts').get_caller_identity()['Account']
region = sess.boto_session.region_name
image = f'{account}.dkr.ecr.{region}.amazonaws.com/{prefix}:latest'

classifier = sage.estimator.Estimator(
image,
role,
1,
'ml.c4.2xlarge',
output_path="s3://{}/output".format(sess.default_bucket()),
sagemaker_session=sess)

classifier.fit(data_location)

An estimator that we created is just a high-level interface for Sagemaker training. In this case, we are using this because our algorithm is custom built. For more information, you may refer here: http://sagemaker.readthedocs.io/en/latest/index.html

Of note, function specifies the entry point to your image, output path, as well as the train_instance_type (with parameter ml.c4.2xlarge). The last parameter specified is basically what CPU/GPU you’d like to use to train your model. You may refer here for more info: https://aws.amazon.com/sagemaker/pricing/instance-types/

3. Deploy

In order to deploy the model, all you’ll need to do is simply typing this the Notebook:

from sagemaker.predictor import csv_serializerpredictor = classifier.deploy(1, 'ml.m4.xlarge', serializer=csv_serializer)

By the end of the deployment, the function will return the endpoint, to which you may call in order to make a prediction.

4. How to predict

The predictor.py file will contain logic as to how your model will:

  • Load the saved model
  • Make prediction

If you are using Keras, you’ll need to change the way that the predictor function loads the model with:

from keras.models import load_model...load_model(os.path.join(model_path, 'sample-model.h5'))

This is because, as aforementioned, we cannot use .pkl and hence we cannot use pickle library (or dill). Instead, we’ll have to use keras’ specific load_model function.

In addition, we’ll also have to add what dataflow graph we’ll be using (https://www.tensorflow.org/programmers_guide/graphs)

If you don’t, you’ll get an issue specified here: https://github.com/tensorflow/tensorflow/issues/14356

Fortunately, this is easily solved as such:

from keras import backend as Kdef get_model():
...
return model
def predict():
sess = K.get_session()
with sess.graph.as_default():
clf = cls.get_model()
return clf.predict(input)

For reference: https://github.com/rca241231/sagemaker_example/blob/master/container/ann/predictor.py

Now that we’ve completed the predict.py file, we’ll need to actually call the API to see if we set everything up correctly.

To do this, you may select some samples from your data file that you trained the model with and pass in as such in Jupyter:

print(predictor.predict(test_X.values).decode('utf-8'))

And voila! We got everything working! Now you can write your node.js calls in order to make predictions within your app in real time!

5. Optional — Take down endpoint

If you don’t want to expose your endpoint, simply run the following command in Jupyter:

sess.delete_endpoint(predictor.endpoint)

Hopefully this guide has helped you through your endeavor in providing an end-to-end process for you to deploy your ML model and expose the endpoint.

Let me know of any questions or ways to improve this article as well!

Don’t forget to clap if you liked the guide.

Sample Code

--

--