Deploying Pytorch Models to Serverless Environments

Judy T Raj
The Startup
Published in
6 min readJun 9, 2019


Google Cloud Functions vs Azure Functions vs AWS Lambda

In this article, let us look at some of the easiest ways of deploying your deep learning models to production. I’m using an iris classifier built in pytorch. I’ll not be discussing the actual building or training of the model. We will jump straight to the deployment part. Here’s a minimal code to building, training and saving a pytorch model to classify iris flowers.

Once we have the model trained and the state dict saved, let us upload it to the cloud, somewhere it can be accessed by our various deployments. For this example, let us upload it to an AWS s3 bucket. The Google Cloud Storage bucket and the Azure blob storage are both equally viable alternatives.

Here’s a link which explains how to create a bucket in AWS S3, if you’re unfamiliar with the procedure. You can upload objects into the bucket by clicking on the Upload button.

I’ve created an S3 bucket named my-iris-model and uploaded the classifier state dict in there, as shown in the figure below.

Turn the Block all public access setting under the permissions tab, so we can make the state dict object downloadable without much ado.

Now select the state dict object we uploaded and click on the actions menu and make it public. Copy the url path from the info tab that pops up to the right. Here’s the url to my state dict, You’ll get something similar. This is the url we’ll be using in our microservices later to access the model.

State dicts as more robust to changes in environment as opposed to saving the whole models and hence is the recommended method to persist pytorch models.

Google Cloud Functions

Now that we have the model ready to make predictions and accessible on the cloud, let us set up the deployments, starting with google cloud.

Google Cloud Functions is a serverless execution environment. It supports python and node.js and can be triggered by either http requests or various cloud events. It is very easy and user friendly to set up. All you need to do is write your code in the little ui box and specify the libraries to be downloaded and Google will do the rest.

Log on to the gcp console, You can use your gmail account to sign up, if you’re new to gcp. You’ll need to create a goole cloud project and set up a billing account before you can start. Google gives your 300$ worth of free credit to start with, so you can get started right away. Here’s a link to the docs that’ll help you set up a project in gcp.

Click on the hamburger button on the top left corner in the console home screen and scroll down to the compute options and click on the Cloud Functions option.

Click on the create function button and fill up the form that appears to create a cloud function.

I’m giving the function the name iris_classifier. Cloud functions are deployed on linux VMs. I’ve chosen 512MB memory to be allocated. This is more than enough to house all the python packages we need for our deployment. We will use the ui to write out code, so I chose the inline editor option under source code. We’ll be sending http requests to invoke the method, so the trigger type is http. The url shown on screen is the url we’ll need to invoke to get predictions from the model once it is deployed.

We need to fill the and requirements.txt files. The function will need to download the torch package for linux as well as the requests module to download the state dict. We’re also gonna need numpy to extract and convert the data we send in with our http request to get predictions.

Click on requirements.txt and paste the requirements in.

## Torch installation for Linux. Pip install torch installs some version of torch which causes problems on the cloud function vm.

In our, all we gotta do is define a torch model object of our class Net( the classifier), download the state dict from the bucket and initialise our model object with the state dict. Then we can use the requests module’s get json method to extract the parameters passed in via the POST request, convert it to a torch tensor and pass it to the model for prediction and return the prediction.

Here is the code for the

Write the name of our function in the file in the Function to execute column and click on create to deploy the cloud function. It might take a minute to deploy and voila, our model is now up in the cloud ready to make predictions.

Copy the url from under the trigger tab to invoke the method. Here’s a sample curl request and the returned answer.

curl -XPOST -H 'Content-Type: application/json' -d @test.json

Here’s the contents of the test.json file:

“x”: [5.1,3.5,1.4,0.2]

And the curl request returned the message, “The iris belongs to class -0”

Azure Functions

Azure Functions is the Microsoft equivalent of the Google Cloud Functions and the AWS Lambda. It is a serverless compute service that enables you to run code on-demand without having to explicitly provision or manage infrastructure. You’ll need a Microsoft account for this part. Sign in to the Azure portal at with your Azure account. Azure offers 200$ worth of credit to begin with.

We need to set up a Functions App before we can deploy a function. Click on the plus icon at the top left corner that says “Create a resource”. Then select Function App from Compute.

Fill in the form that appears with the name of your app and resource group and storage account respectively. Choose the location, the subscription and the hosting plan as well. Here’s a sample.

It’ll take a moment for the deployment to finish. Once the function app is successfully deployed, Select Go to resource in the deployment succeeded message to view your new function app.

This is where you can create your function to host the model. Azure functions are more complicated than the Google Cloud equivalent but they support java/maven as well, in addition to python3.6.

Here’s a link that walks you through the process of writing a python function in the Function App. Once the function is deployed, you can use the api it exposes, to send requests via curl to get predictions.

AWS Lambda

AWS Lambda is another popular serverless execution environment from Amazon that allows you to deploy functions that can be triggered by either http requests or cloud events.

Unfortunately, AWS Lambda requires your environment to have a maximum size of 50mb, unlike the Google equivalent which lets you chose the memory size from five different options, 128mb, 256mb, 512mb, 1gb or 2GB. Even with using Zappa, a library that makes it easy to build and deploy all Python WSGI applications on AWS Lambda + API Gateway, or the Lambda option to load packaged code from Amazon S3 without much performance loss, which lets you have an environment size of maximum 100mb, my model and its required libraries packaged together proved to be too big for AWS Lambda.

But if your model can be packaged into something under the size of 100mb, here’s a detailed tutorial I found on medium that will walk you through the process of deploying your deep learning model to AWS Lambda.

Of the three, I’d say Google Cloud Functions is the most user friendly while the others do have their merits.



Judy T Raj
The Startup

Google-certified Tensorflow Developer | Google Cloud Architect | Author | Software Engineer