ONNX: the long and collaborative road to machine learning portability

Jakob Klepp
Moonvision
Published in
7 min readApr 17, 2019

Some time ago after training a model using deep learning frameworks you were locked into using it for inference as well. Considering that the inference environment often drastically differs from the training environment, this is likely not what you want. You feel the pain when you use various cloud platforms with GPU clusters for training and want to deploy the resulting models on low powered industrial PCs, on smartphones, or in the web. To do that using the same framework during inference as during training might not always be the best option. Thanks to ONNX (Open Neural Network Exchange) you no longer have to.

Though ONNX has only been around for a little more than a year it is already supported by most of the widely used deep learning tools and frameworks — made possible by a community that needed a standard for better collaboration.

Motivation

If you ask yourself why you should even care about the ability to use different ONNX backends, think about integration! Being flexible with the backend is key to integrating deep learning solutions into existing systems! The fact that ONNX models are supported by so many backends opens up a wide range of new deployment scenarios that were previously not feasible.

Suddenly sharing models is an option, especially when the model is implemented by another entity than the creator. And this is not a startup thinking — big companies are collectively supporting the standard developed by Microsoft. With ONNX Runtime, a ONNX backend developed by Microsoft, it’s now possible to use most of your existing models not only from C++ or Python but also in .NET applications. [1]

NVIDIA’s platform for high-performance deep learning inference, TensorRT, uses ONNX to support a wide range of deep learning frameworks. [2] TensorRT applies specific optimisation to allow for higher throughput and lower latency on their Tesla GPUs in data centers, as well as their Jetson embedded platforms for edge deployments.

Qualcomm’s Snapdragon NPE (Neural Processing Engine) SDK adds support for neural network evaluation to mobile devices. [3] It can be used to implement inference directly into Android apps without having to rely on cloud services. It is optimized to perform well on Snapdragon chipsets. While only the Caffe, Caffe2 and TensorFlow model formats are directly supported by NPE a large number of deep learning frameworks are indirectly supported via the ONNX format.

How to use it

ONNX models are created from your existing deep learning models. The exact process varies from framework to framework but most of the popular deep learning frameworks either have a way to directly export the model in the ONNX format or there are converters which convert the framework specific models to the ONNX format.

Lets dive into the details of the ONNX format? An ONNX file is a Protobuf encoded tensor graph build from ONNX operations. The ONNX operations are versioned as operation sets, short opsets. [4] Each runtime has to specify which opsets it supports and then support those opsets completely. This improves compatibility across implementations.

The operations include most of the typical deep learning primitives, linear operations, convolutions and activation functions. The framework specific model is mapped to the ONNX format by executing the model with, often just random, input data and tracing the execution. The operations executed are mapped to ONNX operations and so the entire model graph is mapped into the ONNX format. After this the ONNX model is then saved as .onnx Protobuf file which can be read and executed by a wide and growing range of ONNX runtimes.

How we use it — Initial Success

Back in January we were already successfully using PyTorch, ONNX and ONNX Runtime together in a production environment — but we still moving on tough terrain. Pytorch was releasing many breaking changes and got rid of plenty of legacy code.

I’m not really sure how universally useful this is going to be. These commits are from a time where pytorch was introducing a lot of features with breaking changes but also removing a lot of legacy code. I think the pytorch commit that worked for use was one of those two.

In order to find the most feasible combination of libraries, we developed tests to measure the numerical and runtime difference for several important models. The final logits deviated quite strongly from the PyTorch output for the ONNX Runtime. Surprisingly, this translated to rather low deviations in the qualitative output (bounding boxes or pixel-wise segmentations). We still have to figure out if those numerical differences are largely linear such that final argmax and softmax operations that we do in numpy don’t affect the final outcome.

For a glimpse into our tests, we listed the runtime and qualitative differences for a segmentation model between Pytorch GPU, Pytorch CPU, Caffe2 CPU and ONNX Runtime CPU in their aforementioned versions (8 hyper-threads, GTX 1080Ti respectively).

Caffe2 native on CPU, no bilinear Upsampling, Detection time: 2.45s.
Caffe2 on CPU, no bilinear Upsampling, Detection time: 2.57s.
ONNX Runtime on CPU, with bilinear Upsampling, Detection time: 0.761s.
PyTorch on GPU, with bilinear Upsampling, Detection time: 0.038s.

How we rapidly will use it to our advantage

All this is good and well but there are still a couple of issues hindering us from using ONNX in our docker edge deployment cycle. Opsets are changing fast, while this means a lot of improvements, it also means that what worked a month ago, might not work in the present any more. Among the many ONNX backends, few support the current opset version 9, let alone the upcoming version 10. And even when the new opset versions are supported, it takes a while until they make it into official releases of the various inference frameworks. This means that a lot of time is spent on trying to compile, mix and match master branches of various deep learning tools, libraries and their dependencies → this is basically my job description here at MoonVision.

In particular we had issues with the Upsample operator. Based on our investigation and involvement we think that the specification for the Upsample operator was perhaps unclear and was misinterpreted by various parties which lead to a number of conflicts. Since it seems that we aren’t the only ones having issues with the Upsample operator we compiled an easy follow version history as well as a collection of relevant issues.

Upsample Changelog

Add an upsample2d operator as experimental

PR created on: 2017–11–02
PR merged on: 2017–11–15
Opset: 1
URL: github.com/onnx/onnx/pull/180

Change ResizeNearest op to Upsample op

PR created on: 2017–11–18
PR merged on: 2017–11–27
Opset: 1–6
URL: github.com/onnx/onnx/pull/290

Promote Upsample op

PR created on: 2018–05–01
PR merged on: 2018–05–10
Opset: 7–8
URL: github.com/onnx/onnx/pull/861

Change upsample operator to allow dynamic ‘scales’

PR created on: 2018–10–02
PR merged on: 2018–10–26
Opset: 9
URL: github.com/onnx/onnx/pull/1467

Support down sampling for Upsample with scales < 1

PR created on: 2019–01–25
PR merged on: 2019–03–20
Opset: 10
URL: github.com/onnx/onnx/pull/1773

Related Issues

Add linear mode case for upsample operator

Issue created on:
Used opset:
URL: github.com/onnx/onnx/pull/1295#discussion_r228605032

Upsample ONNX op uses wrong mode

Issue created on:
Used opset:
URL:github.com/pytorch/pytorch/issues/12647

Modify the ONNX Upsample mode “linear” -> “bilinear”

Issue created on:
Used opset:
URL: github.com/pytorch/pytorch/pull/13382

Upsample node does not support align_corners attribute

Issue created on:
Used opset:
URL: github.com/onnx/onnx/issues/1612

Upsample problem

Issue created on:
Used opset:
URL: github.com/onnx/onnx/issues/1807

Error converting upsample operator from pytorch to tensorflow

Issue created on:
Used opset:
URL: github.com/onnx/onnx/issues/1827

Upsample only checks its “scales” input from float_data field

Issue created on:
Used opset:
URL: github.com/onnx/onnx/issues/1852

PyTorch export of Upsample

Issue created on:
Used opset:
URL: github.com/onnx/onnx/issues/1918

Further Reading

[1]: Learning Machine Learning with .NET, PyTorch and the ONNX Runtime, Microsoft Developer, https://www.youtube.com/watch?v=lamnwHvjEV0

[2]: NVIDIA TensorRT — Integrated with All Major Frameworks, NVIDIA Developer, https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html#onnx_mnist_sample

[3]: Run your ONNX AI Models Faster on Snapdragon, Qualcomm Developer Network, https://developer.qualcomm.com/blog/run-your-onnx-ai-models-faster-snapdragon

[4]: ONNX versioning, onnx, https://github.com/onnx/onnx/blob/master/docs/Versioning.md

This article will be updated in the way with sharing new findings along that long and (still) lonesome road. Give us a comment if you have similar struggles or share your experience with the new format.

--

--