Accelerate and Productionize ML Model Inferencing Using Open-Source Tools

ODSC - Open Data Science
5 min readMar 25, 2020

You’ve finally got that perfect trained model for your data set. Now what? To run and deploy it to production, there’s a host of issues that lie ahead. Performance latency, environments, framework compatibility, security, deployment targets…there are lots to consider! In this tutorial, we’ll look at solutions for these common challenges using ONNX and related tooling.

ONNX (Open Neural Network eXchange), an open-source graduate project under the Linux Foundation LF AI, defines a standard format for machine learning models that enables AI developers to use their frameworks and tools of choice to train, infer and deploy on a variety of hardware targets. Models trained with PyTorch, Tensorflow, Scikit-Learn, and others can be converted to the ONNX format and benefit from acceleration on cloud and edge devices via supported hardware optimizations.

One mainstream way to infer ONNX models is using the open-source high-performance ONNX Runtime inference engine. For complex DNNs, ONNX Runtime can provide significant gains in performance, as demonstrated by this 17x inference acceleration of a BERT model used by Microsoft Bing. For traditional ML, ONNX Runtime can provide a more secure and straight-forward deployment story to minimize security vulnerabilities exposed by .pkl files or messy versioning (ONNX Runtime is fully backward compatible with older versions of ONNX models). With APIs for C++, C#, C, Python, and Java, ONNX Runtime removes the need to have a Python environment for inferencing. For systems looking to experiment with or support models from different frameworks, using ONNX models with ONNX Runtime provides a single integration point without needing to maintain custom code or multiple runtimes.

Related reading: Interoperable AI: High-Performance Inferencing of ML and DNN Models Using Open-Source Tools

Tutorial Overview

Let’s put this in action. In the examples below, we’ll demonstrate how to get started with ONNX by training an image classification model in PyTorch and a classification model in scikit-learn, converting them to the ONNX format and inferring the converted model using ONNX Runtime.

https://odsc.com/boston/livestream/
  1. Image classification using PyTorch:

-Train a PyTorch model (resnet18) for a sports image classification problem.

-Export the trained model as ONNX for high-performance inferencing.

2. Classification task using scikit-learn:

-Train a scikit-learn model on a dataset with 5 classes.

-Convert the trained model to ONNX format using skl2onnx.

Image Classification with PyTorch

Training a PyTorch model

The task at hand is to classify sports images into 8 classes. We partition the data into train, test and validation sets after performing a series of transforms. These involve cropping the images into 224 x 224 sized samples, randomly flipping them horizontally, converting them to tensor and normalizing them.

data_transforms = transforms.Compose([        transforms.RandomResizedCrop(224),        transforms.RandomHorizontalFlip(),        transforms.ToTensor(),        transforms.Normalize([0.47637546, 0.485785  , 0.4522678 ], [0.24692202, 0.24377407, 0.2667196 ])    ])data = datasets.ImageFolder(root=image_folder, transform=data_transforms)

We pick resnet18 from the list of pre-trained models provided by PyTorch and replace its final layer so that it aligns with our problem.

model = models.__dict__[model_name](pretrained=True)# Alter the final layerfinal_layer_input = model.fc.in_features# nn.Linear a linear transformation to the incoming data: y = x A^T + bmodel.fc = nn.Linear(final_layer_input, num_classes)

We train the model on the training set, also scoring on the validation set at each epoch. After training is complete, we run a prediction on the test set.

Conversion to ONNX

Since PyTorch natively supports exporting models in the ONNX format, the trained model can be exported as shown below:

dummy_input = torch.randn(1, 3, 224, 224)torch.onnx.export(model, dummy_input, "sports_classification.onnx")

Inferencing with ONNX Runtime

Once the model is in the interoperable ONNX format, we can inference the ONNX model using onnxruntime:

from onnxruntime import InferenceSession

ort_session = InferenceSession('sports_classification.onnx')result = ort_session.run(None, {'input.1': test_sample.numpy()})

As shown in the table below, ONNX Runtime outperforms PyTorch in terms of prediction latency for various batch sizes. There are additional performance tuning techniques that can further improve latency based on hardware choices.

Batch sizePrediction time (ms)*ONNX RuntimePyTorch% improvement160.2 ms ± 6.53 ms108 ms ± 1.75 ms79%2129 ms ± 1.97 ms175 ms ± 1.55 ms36%4259 ms ± 3.23 ms318 ms ± 5.18 ms 23%8510 ms ± 10.8 ms537 ms ± 5.77 ms5%16989 ms ± 7.84 ms1.07 s ± 7.08 ms8%321.98 s ± 26.3 ms2.03 s ± 19.7 ms3%

The full detailed Jupyter notebook can be found here.

Classification task with scikit-learn

Next up, let’s look at a classical ML model trained using scikit-learn.

Training a Scikit-Learn model

We use make_classification() to generate a dataset with 10,000 samples, 10 features, and 5 classes. We standardize the data using StandardScaler() and then train an MLPClassifier model.

model = Pipeline([('scaler', StandardScaler()),   ('predictor', MLPClassifier(random_state=42))])model.fit(X_train, y_train)

Conversion to ONNX

To convert the model to the ONNX format, we will use convert_sklearn() from the skl2onnx library. convert_sklearn() expects the scikit-learn model, model name and input type as its parameters as shown below:

model_onnx = convert_sklearn(model, 'classification model', [('input', FloatTensorType([None, X_test.shape[1]]))])

Once the model is converted, we use save_model() from onnxmltools.utils to save the converted model.

save_model(model_onnx, 'onnx_model.onnx')

Inferencing with ONNX Runtime

Similar to the previous example, for inferencing we use InferenceSession() from the onnxruntime library.

sess = InferenceSession('onnx_model.onnx')res = sess.run(None, input_feed={'input': X_test})

In this example, we also see notable performance improvements using ONNX Runtime.

Batch SizePrediction Time (ms)*ONNX Runtimescikit-learn% Improvement 10.0539510.329427511%20.0376740.267644610%40.0398590.270791579%80.043920.275846528%160.0552120.280654408%320.0861430.32052272%640.1133930.376533232%1280.2572390.48137687%2560.4959930.74860251%5120.9384911.14487522%10241.5375581.9059624%20482.5011113.4547138%

The full detailed Jupyter notebook can be found here.

Conclusion

These examples are just the tip of the iceberg for the applications and value of ONNX. For further reading, check out ONNX Tutorials and ONNX Runtime Tutorials for more samples. If you have any questions, please join the ONNX and ONNX Runtime communities on Github for active discussions.

Finally, we invite you to join us at ODSC East in Boston for our hands-on workshop to learn more about ONNX, ONNX Runtime, and how you can use these open-source tools to accelerate and deploy state-of-the-art ML and DNN models in production.

*Note: All the benchmarking experiments were run on Azure Notebook VM Standard_D3_V2. Python’s timeit() was used to measure prediction time-averaged over 100 executions.

--

--

ODSC - Open Data Science

Our passion is bringing thousands of the best and brightest data scientists together under one roof for an incredible learning and networking experience.