ONNX: Bridging the Gap between Different Machine Learning Frameworks

6 min readJan 14, 2024

Introduction

We often talk about deploying a Machine Learning Model as the hardest problem in Data Science. Organizations that want to leverage AI at scale must overcome a number of challenges around model training and model inferencing. Today, there are a plethora of tools and frameworks that accelerate model training, but inferencing remains a tough nut due to the variety of environments that models need to run in. If we look at the multitude of different ways that we can build a neural network model today, we would be right to feel overwhelmed. Models can be built in R, Python, ML.net…., with different frameworks such as XGBoost, SkLearn, Tensorflow, PyTorch, Theano, CNTK, or others, which only adds to the problem. Being able to deploy to each one of those environments means that we need to be deploying in the format which is required for that model framework. Think that typically means the use of a single environment or the must-have of multiple environments running together.

What was once difficult for model deployment now is being made simpler and more concise: One approach is “ Machine Learning runtimes ”.

Machine Learning Runtimes: Maximizing Efficiency and Performance in Model Execution

A Runtime is an environment which runs in multiple languages. We can build a model using the language which is best for our problem statement and then save the model in the runtime format. We have seen runtimes before, for example “Mleap” which is a runtime that works for Apache Spark, SKLearn and Tensorflow. Mleap has had a good reception across the industry and is widely used. The problem with MLeap, is that it does not support other deep learning frameworks, and this is where ONNX and ONNX Runtime can help.

What is ONNX ?

ONNX is the acronym that stands for Open Neural Network Exchange, which is open format built to represent machine learning models that facilitates interoperability between Deep Learning frameworks. It provides a definition of an extensible computation graph model, built-in operators and standard data types, as well as a common file format to enable AI developers to use models with a different frameworks, tools, runtimes, and compilers. The specificity of ONNX even allows one to automatically compile the stored operations to lower level languages for embedding on various devices

ONNX Technical Design

Conceptually, the ONNX format is easy enough: An onnx file defines a directed graph in which each computation dataflow graph is structured as a list of nodes that form an acyclic graph, and each edge of that graph represents a tensor with a specific type that is “moving” from one node to the other. Nodes have one or more inputs and one or more outputs. Each node is a call to an operator.

The graph also has metadata to help document its purpose, author, etc. Operators are implemented externally to the graph, but the set of built-in operators are portable across frameworks. Every framework supporting ONNX will provide implementations of these operators on the applicable data types.

ONNX key benefits

1- Interoperability

With ONNX, we can develop the model in our preferred framework without worrying about downstream inferencing implications. ONNX enables to use the preferred framework with the inference engine that we choose, the figure bellow shows a collection of frameworks dedicated for building & training a model.

2- Hardware Access

ONNX makes it easier to access hardware optimizations. We can use ONNX-compatible runtimes and libraries designed to maximize performance across hardware. The figure bellow shows the different runtimes designed to deploy the ONNX model and accelerate inferencing.

Getting ONNX model

1. ONNX Model Zoo

Model Zoo is a machine learning model deployment platform for deep learning researchers that provides pre-trained models for a variety of platforms and uses.

2. Hugging Face

We can find ONNX models directly in the Hugging Face Model Hub in its ONNX model library. In addition, Hugging Face provides ONNX support for a variety of other models not listed in the ONNX model library.

3. Convert model from existing model built with another framework

Converting deep learning models from another framework to ONNX is quite straightforward. After launching “Optimum” by Hugging Face, which is an extension of Transformers and Diffusers, you can easily convert pretrained models to ONNX, and Transformers.

In the case of PyTorch, the model conversion process requires the following:

The model is in inference mode. This is because some operations such as batch normalization and dropout behave differently during inference and training.
Dummy input in the shape the model would expect.
The input and names that would like to be used for the exported model.

Here, I’ll provide an example of how to export a ResNet50 model from PyTorch to ONNX format.

First, lets install the required packages:

pip install torch torchvision onnx

2. Then, we import the necessary modules and downloading the pretrained ResNet50 model from PyTorch’s torchvision library

import torch
from torchvision import models

# Download the pretrained ResNet50 model
resnet50 = models.resnet50(pretrained=True)

3. After that, we put the model in evaluation mode to disable features such as dropout and batch normalization during inference.

resnet50.eval()

4. Create some dummy input data to pass through the model for exporting its computational graph to ONNX format. Make sure the input has the same dimensions expected by the model. For Resnet50, it will be in the form: [batch_size, channels, image_size, image_size] indicating the batch size, the channels of the image, and its shape. For example, on ImageNet channels is 3 and image_size is 224.

import numpy as np

# Generate random image data as input
image_data = np.random.randn(1, 3, 224, 224).astype("float32")
image_tensor = torch.from_numpy(image_data)

Here, we generate a single random RGB image with a size of 224x224 pixels. Note that you can replace this dummy data with real images if needed.

5. Define the mean and standard deviation values used for normalizing the input data:

mean_values = [0.485, 0.456, 0.406]
stddev_values = [0.229, 0.224, 0.225]

These values correspond to those typically used for ImageNet datasets.

6. Normalize the input data before passing it to the model:

normalized_image_tensor = (image_tensor - torch.tensor(mean_values)) / torch.tensor(stddev_values)
normalized_image_tensor = normalized_image_tensor.unsqueeze(0)

7. To convert the PyTorch model to ONNX format, We’ll use the torch.onnx.export function.

import torch.onnx

# Specify the export parameters
torch.onnx.export(resnet50,         # model being run
                  normalized_image_tensor,       # model input (or a tuple for multiple inputs)
                  "resnet50.onnx",          # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=12,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['input'],   # the model's input names
                  output_names = ['output'], # the model's output names
                  dynamic_axes={'input' : {0 : 'batch_size'},    # variable length axes
                                'output' : {0 : 'batch_size'}})

This exports the ResNet50 model to an ONNX format file named resnet50.onnx.

8. Verify the generated ONNX model using ONNX Runtime. But before that, you should install ONNX Runtime package. You can install ONNX Runtime using pip:

pip install onnxruntime

import onnxruntime as rt

ort_session = rt.InferenceSession("resnet50.onnx")

# Run the ONNX model on the input data
input_name = ort_session.get_inputs()[0].name
label_name = ort_session.get_outputs()[0].name

ONNX visualization tools

There are a plenty of tools that can provide an ONNX model visualization such as “Netdrawer”, “VisualDL”, “Zetane”, and “Netron” for better understanding of the model by visualizing its computational graph.

NETRON is a viewer for Neural Network, Deep Learning and Machine Learning models, which supports ONNX, TensorFlow Lite, Caffe, Keras, Darknet…

Conclusion

To wrap up, ONNX has emerged as a powerful tool in the field of AI. It serves as a standardized format for model representation and interoperability, enabling collaboration and deployment across different machine learning frameworks. With ONNX, developers and data scientists can easily convert and deploy models, eliminating the need for time-consuming and error-prone manual conversions. ONNX provides such flexibility that empowers organizations to leverage the best features and capabilities of various deep learning frameworks, allowing for efficient model deployment and inference. ONNX is bridging the gap between different frameworks and simplify the deployment process.