Easing model development and deployment

Published in

SFU Professional Computer Science

9 min readMar 8, 2019

Authored by Btara Truhandarien, Bhuvan Chopra, Grace Kingsly, Mohammad Ullah

Machine learning in production is not just about research and creating a one-off working prototype, it involves the maintenance, supporting subsystems and continuous improvement. It is a complex task and Google admits in a paper. In this post we are going to look at how to ease the difficulty, particularly on the area of model development and deployment. If you are an engineer, this post would help you better prepare against the upcoming importance and trend of AI-powered systems, while as a data scientist you can better communicate your ideas and strategies to fellow data scientists and engineers.

The interoperability of machine learning frameworks

Here is an unsurprising statement: None of these machine learning frameworks are (directly) interoperable with one another. Interoperability is “ability of a system to work with or use the parts or equipment of another system” — Merriam Webster. Similar to how the choice of machine learning framework affects how you and your team build the model, the choice of the framework will affect how the encompassing system will be able to make use of the machine learning model in production. It may be that your organization has decided to settle in with one machine learning framework for every machine learning task onward. This runs the risk of the overall system being tied to the framework, introducing framework coupling. While not necessarily similar to software coupling, it stands true that coupling of systems is a bad thing. You run the risk of building supporting systems such as consumer-facing application, data pipelines and deployment strategies assuming that your machine learning framework will always be TensorFlow/PyTorch/Caffe2 or what have you. Migrating from one framework to another will be painful. Moreover, considering that frameworks have differing strengths and weaknesses, your organization would like to take advantage of multiple frameworks. Managing, deploying and integrating these different set of technologies is likely to be important but at the same time challenging.

Standardization of model formats

JSON, JavaScript Object Notation — it is a very common data format often used to exchange data, with great support in multiple programming languages (despite the JavaScript part in its name). Machine learning frameworks can be made more interoperable if there is a standardized format to define machine learning models, and indeed there is: enter ONNX. Like JSON, ONNX is a format that is independent of any machine learning framework. It defines a set of data types and operators that are available to any model. Frameworks that support the ONNX format supports these instruction sets and data types. Therefore, you can have good confidence that one model built and run in one framework can work in another framework via the ONNX format. With ONNX you can do the following:

# Creating model in PyTorch (note you require the future package which you can install with pip install future)import torch
from torchvision.models import alexnet# input to alexnet is a 224 x 224 RGB, 224 is minimum image dimension according to PyTorch documentation
dummy_input = torch.randn(1, 3, 224, 224)
model = alexnet(pretrained=True)
input = ['alexnet_input']# write to disk the alexnet mode in ONNX format
torch.onnx.export(model, dummy_input, 'alexnet.onnx', input_names=input, verbose=True)
# verbose output to get a sense of what the ONNX format looks likeimport onnx
import caffe2.python.onnx.backend

# Some random input
import numpy as np
img = np.random.randn(1, 3, 224, 224).astype(np.float32)

# Load the ONNX model that was save
model = onnx.load('alexnet.onnx')# Run the ONNX model with Caffe2
outputs = caffe2.python.onnx.backend.run_model(model, [img])
print(outputs)# potential output
Outputs(_0=array([[-0.05282867, -1.2265869 , -1.0806911 , ..., -1.1361005 ,
        -0.8091734 ,  1.084933  ],
       [-0.2020778 , -1.5212066 , -1.9587349 , ..., -1.1595379 ,
        -1.3152201 ,  1.3661635 ],
       [-0.0987098 , -0.91584986, -1.4721913 , ..., -1.2918088 ,
        -0.7281339 ,  1.1157584 ],
       ...,
       [-0.09950841, -1.6349491 , -1.0889211 , ..., -1.2242676 ,
        -0.74637014,  1.0592045 ],
       [ 0.19841197, -1.443419  , -1.9653301 , ..., -1.1936607 ,
        -0.7369804 ,  1.0796065 ],
       [ 0.12863973, -1.4388589 , -1.919874  , ..., -1.3095584 ,
        -1.0590329 ,  1.1892282 ]], dtype=float32))

By using ONNX we will be able to share ideas and results faster between teams and researchers by minimizing the amount of re-implementation in different framework dialects.

In fact, we can even run the ONNX model without requiring another framework using ONNX runtime:

# running ONNX model without Caffe2import numpy
import onnxruntime as rt
sess = rt.InferenceSession("alexnet.onnx") # needed to run
input_name = sess.get_inputs()[0].name # get input name (it would be 
X = numpy.random.random((1, 3, 224, 224)).astype(numpy.float32)
pred_onnx = sess.run(None, {input_name: X})
print(pred_onnx)

Standardization of protocol during inference

Having a standard format for models is all well and good, but if our main goal is to deliver these models to production it is not enough. Now consider how you would deploy your models (potentially built indifferent frameworks) to production. Taking off from the example above, one straightforward way would appear to be to take the loaded model and wrap it around some serving mechanism e.g API or RPC service. Indeed it is demonstrated that the idea works like here and here (not saying its production-grade material, but it works). This leaves us with the task of designing the protocol to speak to the APIs.

Depending on the runtime frameworks we choose to use, it may be necessary to support different data format protocols. If we can transform our models to an ONNX model, then we can use ONNX runtime expected input format as our protocol message. However, what if we can’t, perhaps because we have another model in TensorFlow (which does not have official ONNX support yet, though there are unofficial ones to and from ONNX)? We would have to concede to implement multiple protocols.

Or perhaps not. Introducing GraphPipe, a technology that helps you standardize the protocol to speak to your model.

By standardizing the protocol with GraphPipe we alleviate the problem of having to write framework-specific protocol in our client code and keeping it simple. The benefit of a common protocol also extends to human communication, where now data scientists and data engineers can speak in a standardized GraphPipe dialect instead of framework specific dialects/protocol. But enough talk, let’s see it in action.

First make sure you have the following installed:

Docker (pick the one for your platform)
Conda (we used Anaconda instead of miniconda)

Create a new conda environment and install necessary dependencies

conda create --name graphpipe-trial
conda activate graphpipe-trial
conda install -c pytorch pytorch torchvision
pip install graphpipe onnx onnxruntime matplotlib numpy==1.16

The numpy 1.16 is needed to resolve some dependency conflicts that caused the request to graphpipe fails.

Let’s use a different model this time: DenseNet. Start with similar setup as before

import torch
from torchvision.models import densenet
dummy_input = torch.randn(1, 3, 224, 224)
input = ['densenet_input']
model = densenet.densenet121(pretrained=True)
torch.onnx.export(model, dummy_input, 'densenet.onnx', input_names=input, verbose=True)

Now download the following two images

wget -O market1.jpg https://bit.ly/2SQW6co
wget -O market2.jpg https://bit.ly/2C9hmUY

Both are images of a food market and in ImageNet classification it’s #582.

We would also need to create a densenet.value_inputs.json for the model (similar to what was linked). Here is the content for it:

{"densenet_input": [1, [1, 3, 224, 224]]}

Now run the following docker command

docker run -it --rm\
  -p 9000:9000 -v "$PWD:/models/"\
  sleepsonthefloor/graphpipe-onnx:cpu\
  --model=/models/densenet.onnx\
  --value-inputs="/models/densenet.value_inputs.json"
  --listen=0.0.0.0:9000

The above command translates to: “please run the graphpipe ONNX server with the densenet ONNX model with these definition files and value input file and have port 9000 opened on host machine”. Essentially, it means that we can create requests to our GraphPipe ONNX server! So let’s do that

from graphpipe import remote
from PIL import Image
import numpy as np
from torchvision import transforms# PyTorch documentation tells us that the input
# needs to be rescaledmean = [0.485,0.456,0.406]
std = [0.229, 0.224, 0.225]
transformers = transforms.Compose([
 transforms.Resize((256, 256)),
 transforms.CenterCrop((224)),
 transforms.ToTensor(),
 transforms.Normalize(mean, std)
])img1 = Image.open('market1.jpg')
img1 = transformers(img1)
data1 = np.expand_dims(img1.numpy(), axis=0)
pred1 = remote.execute('http://127.0.0.1:9000', data1)
print("Market 1 prediction is", np.argmax(pred1, axis=1))img2 = Image.open('market2.jpg')
img2 = transformers(img2)
data2 = np.expand_dims(img2.numpy(), axis=0)
pred2 = remote.execute('http://127.0.0.1:9000', data2)
print("Market 2 prediction is", np.argmax(pred2, axis=1))

Running the above code on another terminal session should give us

Market 1 prediction is [865]
Market 2 prediction is [582]

and also a trace of request calls in the terminal that ran the docker command. Success!

A whole new world has opened to us. Courtesy of Razvan Chisu on Unsplash

There we have it, a callable predictive model in a couple of minutes. Also as a side note, market 1 was not classified correctly. If you check, it is a bit hard to know exactly what the picture is, it can look as a toystore with the balloons behind the scene. So maybe you want to try other models. Thankfully there are pre-built ONNX models available here to try.

For a bit of completeness sake, let’s try the same pre-trained model but with TensorFlow. First install TensorFlow

conda install -c conda-forge tensorflow
pip install onnx-tf # an unofficial support for ONNX with TF

Now transform our densenet ONNX model to a TF model

import onnx
from onnx_tf.backend import prepareonnx_model = onnx.load('densenet.onnx')
dense_tf = prepare(onnx_model)
tf_rep.export_graph('tf_path')

And… wait we got an error

NotImplementedError: BatchNormalization version 9 is not implemented.

It’s great, but not perfect

As you can see not every model can be exported and imported back and forth between different frameworks. Perhaps if TensorFlow officially supported ONNX then we may have been successful to try and run TF in GraphPipe (feel to try though you would need to create a .pb file which you can try to do by following the steps here from Keras). Nevertheless, the point stands that we will not always be able to transform models across frameworks using ONNX. Remember that ONNX attempts to provide a standardized set of instructions and data types which is hard to achieve with the current fractured machine learning framework ecosystem. Right now ONNX is still focusing on “the capabilities needed for inferencing (scoring)”.

We also did not find a way to do transfer learning on a different machine learning framework using ONNX. This is definitely a highly desired ability which will improve agility in R&D, but sadly there does not seem to be a way (tell us if you know if there’s a way!). Lastly, ONNX and GraphPipe documentation are quite sparse right now, so on the ease of use for more complex work it will be harder to do.

Final words

We have played around with ONNX and GraphPipe a bit. We saw how effortless it was to start with a pre-trained network built on top of PyTorch and run it as a callable service. We also saw that we can convert a PyTorch model to Caffe2 model with relative ease. In our attempt to run a TensorFlow model on GraphPipe by transforming our ONNX model, we found that we can stumble upon some incompatibility issues.

Overall ONNX is a great step towards standardization and interoperability, reducing the friction between various frameworks, and easing the difficulty of deploying models in one framework by taking advantage of the capabilities in other frameworks. GraphPipe takes it a step further by standardizing the protocols of your deployed machine learning models, further decreasing the cost of having to create plumbing code to handle customized protocols for different models created (or deployed) in different frameworks.

Both ONNX and GraphPipe are fairly young in contrast to current popular machine learning frameworks. There is definitely a lot of room for growth and improvement and standardization will surely continue to be beneficial, so tune in and don’t miss out!

If you liked the post and found it helpful or insightful, we appreciate it if you can give us some claps. Thank you for your time