Using Chalice to serve SageMaker predictions

Julien Simon
Apr 26, 2018 · 5 min read

Amazon SageMaker makes it easy to train and deploy Machine Learning models hosted on HTTP endpoints. However, in most cases you won’t expose these endpoints directly. Pre-processing and post-processing steps are likely to be required: authentication, throttling, data transformation and enrichment, logging, etc.

In this post, we will use AWS Chalice to build a web service acting as a front-end for a SageMaker endpoint.

Let’s get started.

Oh, you’re the front-end service, I suppose.

Training and deploying our SageMaker model

I’ve already written a couple of posts (here and here) on training and deploying SageMaker models, so I won’t go into these details again. For our purpose, we’ll use the built-in algorithm for image classification to train a model on the Caltech-256 data set, as presented in this tutorial.

Once training and deployment are complete, the endpoint is ready for prediction.

Invoking a SageMaker model

First, we need to figure out what data format the algo expects: as explained in the documentation, we need to post binary data with the application/x-image content type.

We’ll do this with the InvokeEndpoint API, which you’ll find it in your favourite AWS SDK: we simply need to provide an endpoint name and of course the body.

AWS CLI users: don’t be confused if you don’t see it InvokeEndpoint in the list of SageMaker APIs: it’s in ‘sagemaker-runtime’, not in ‘sagemaker’

Here’s an example with boto3.

$ python invoke.py
b'[1.4174277566780802e-05, 1.6996695194393396e-05, 0.00011234339035581797, 0.0002156866539735347, 3.110157194896601e-05, 0.00025752029614523053, 2.8299189580138773e-05, 0.00012142073683207855, 9.117752779275179e-05, 4.787529178429395e-05, 3.5472228773869574e-05, 0.00019932845316361636, 5.5317421356448904e-05, 6.963817668292904e-06, 4.422592246555723e-05, 9.264561231248081e-05, 2.4938033675425686e-05, 0.0002587089838925749, 0.00026409601559862494, 1.0121700142917689e-05, 0.0038431661669164896, 2.7548372599994764e-05, 3.41928462148644e-05, 7.225569424917921e-05, 1.1924025784537662e-05, 7.16273789294064e-05, 0.000851281511131674, 3.8102120015537366e-05, 8.411310773226433e-06, [output removed]

OK, this worked. The response contains 257 probabilities, one for each of the CalTech-256 categories (256 object categories plus a catch-all category). As our image represents a floppy disk (category #75), the highest probability is the 74th one. You can check if you feel like it :)

Now, it’s time to write a front-end web service for this endpoint.

A quick recap on Chalice

Chalice is an AWS Open Source project that lets developers build Python-based web services with minimal fuss. The programming model is extremely close to Flask, so you should feel right at home. Installation is a breeze (pip install chalice) and the CLI is idiot-proof (*I* can use it).

Simple services require very little configuration, if any. Based on the boto3 calls in your code, Chalice generates an IAM policy which will be attached to the Lambda function’s role. You’ll generally want to tweak it a bit, but it’s a nice starting point and a great time saver. You can also bring your own policy (which we’ll do later on).

You can add Python dependencies by listing them in the requirements.txt file. Just remember that Lambda functions can’t exceed a zipped size of 50MB, so you should be conservative!

When it comes to deployment (‘chalice deploy’), Chalice automatically creates a Lambda function running your code, as well as an API in the API Gateway to trigger it. If you’d rather deploy with SAM, that’s possible too: ‘chalice package’ will build both the deployment package and the SAM template.

Last but not least, you can run your service locally (‘chalice local’) which obviously helps debugging.. and allows you to code on planes ;)

Alright, let’s get to it.

Writing an image prediction service with chalice

As we just saw, the endpoint returns a raw prediction, which contains probably more information than we need to send back to the client. Thus, our service will only return the top k classes and probabilities, in descending order of probability.

The body of our POST request will contain:

  • a mandatory base64-encoded image,
  • an optional value for ‘k’. If it’s missing, the service will use the default value of 257.

For more flexibility, let’s store the SageMaker endpoint name in an environment variable for the Lambda function.

Here are the steps we need to take:

  • decode the base64-encoded image,
  • read the endpoint name from an environment variable,
  • invoke the endpoint using the InvokeEndpoint API in boto3,
  • read the response,
  • sort categories by descending order of probability,
  • return only the top k categories and probabilities.

Pretty straightforward, I think. Most of the work is actually to convert response data into something that we process. Indeed, the response body is a byte array representing a bracketed array of comma-separated probabilities, which is inconvenient to work with.

In order to turn it into a proper Python array, we can evaluate it as a Python expression with ast.literal_eval(). And voila: a Python array! Nice trick.

We can then build a Numpy array holding the category indexes in ascending order of probability, reverse the array and take the first top k elements. The last step is to build the response body.

Pfeew. That was a little more Python plumbing than I originally expected, but it’s actually a good example of post-processing raw predictions.

Configuring the service

Configuration is pretty simple and is stored in .chalice/config.json:

  • an environment variable to store the endpoint name,
  • a custom IAM policy, allowing the Lambda function to call the InvokeEndpoint API. Once again, we could let Chalice generate it for us, but it’s good to know how to customize your policy :)
Configuration file
IAM policy

Using ‘*’ as a resource selector for InvokeEndpoint is ok here. However, in production, I would strongly recommend using the actual endpoint ARN instead.

Running the service locally

Let’s test the service locally by running ‘chalice local’ and then curl to invoke it.

Nice. Our service seems to work :)

Deploying the service

Now it’s time to deploy on AWS by running ‘chalice deploy’. Let’s run the same test.

All good. That’s it, then: we successfully built a serverless microservice invoking a SageMaker endpoint (code and test script on Github).

This is pretty fun, so why not write another service?

Bonus: an image resizer service with Chalice

Computer vision models require input images to have the same size as training images. I believe the SageMaker built-in algorithm handles image resizing for us, but it’s definitely something we’d have to take care of if we worked with your own custom model.

Here’s how you could do this with the OpenCV library (code and test script on Github).


That’s it for today. As you can see, it’s quite simple to invoke SageMaker endpoints. If you want to go all the way and build a web service, then Chalice is certainly an option you should consider. I can’t think of an easier way to do it!

Thanks for reading. Happy to get your feedback and answer your questions here or on Twitter.


Bay Area Thrash Metal, 1988. This A bit of trivia: Paul Bostaph on drums… now in Slayer \m/

Julien Simon

Written by

Hacker. Headbanger. Harley rider. Hunter. https://aws.amazon.com/evangelists/julien-simon/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade