Week-4: Converting and Running Machine Learning Models with ONNX

4 min readSep 4, 2024

As machine learning models become more sophisticated and widespread in their applications, ensuring they can run efficiently across different platforms and environments has become a priority. The Open Neural Network Exchange (ONNX) format has emerged as a powerful solution for this challenge. In this blog post, we’ll explore how to convert a machine learning model to ONNX and perform inference using ONNX Runtime.

🔙 Previous: Week-3: How to Use DVC for Machine Learning Model Management

🔜 Next:

What is ONNX?

ONNX is an open-source format for representing deep learning models. It allows models to be trained in one framework (like PyTorch or TensorFlow) and then exported to run in another environment with different performance characteristics or hardware capabilities. ONNX is especially useful for deploying models in production because it is optimized for both cloud and edge devices.

Converting a Model to ONNX

The first step in leveraging ONNX is converting your trained model into the ONNX format. The script convert_model_to_onnx.py demonstrates how to do this with a model trained using PyTorch.

Here’s a breakdown of what this script does:

Load Necessary Libraries: The script imports essential libraries, such as torch for PyTorch functionalities, hydra for managing configurations, and logging for tracking the conversion process.
Load the Pre-trained Model: Using Hydra, the script loads a pre-trained model (ColaModel), specified by a configuration file. It retrieves the model's checkpoint from a directory and initializes the model.
Prepare Input Data: The script then prepares a sample input batch using a DataModule class. This data module handles tokenization and batching, ensuring that the input data is in the correct format for the model.
Convert the Model to ONNX: The core of this script is the torch.onnx.export function. It converts the PyTorch model to ONNX format by tracing the model’s operations with a provided input sample. The model's input and output names and dynamic axes for variable-length inputs are defined to make the ONNX model versatile.
Save the ONNX Model: Finally, the ONNX model is saved to a specified directory, ready for deployment.

Here is a snippet from convert_model_to_onnx.py to illustrate the conversion process:

torch.onnx.export(
    cola_model,  # model being run
    (
        input_sample["input_ids"],
        input_sample["attention_mask"],
    ),  # model input (or a tuple for multiple inputs)
    f"{root_dir}/models/model.onnx",  # where to save the model
    export_params=True,
    opset_version=10,
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={
        "input_ids": {0: "batch_size"},
        "attention_mask": {0: "batch_size"},
        "output": {0: "batch_size"},
    },
)

Running Inference with ONNX Runtime

Once the model is converted to ONNX, the next step is to perform inference using ONNX Runtime, an optimized engine to run ONNX models efficiently.

The script inference_onnx.py illustrates how to use ONNX Runtime for model inference:

Import Libraries: The script imports onnxruntime for running the model, numpy for handling input data, and some utility modules for data processing and timing.
Initialize the ONNX Runtime Session: The ColaONNXPredictor class initializes an ONNX Runtime inference session with the provided model path. This session loads the ONNX model and prepares it for inference.
Preprocess Input Data: Similar to the training phase, the input data is tokenized and formatted correctly for the ONNX model. The input IDs and attention masks are prepared using NumPy to expand dimensions as required by the model.
Run Inference: The ONNX model is run with the preprocessed inputs using the ort_session.run method. The outputs are processed, and softmax is applied to generate probabilities for each class.
Output Predictions: The results are formatted into a human-readable form, displaying the predicted labels and their associated scores.

Here’s a snippet from inference_onnx.py showing the inference process:

ort_inputs = {
    "input_ids": np.expand_dims(processed["input_ids"], axis=0),
    "attention_mask": np.expand_dims(processed["attention_mask"], axis=0),
}
ort_outs = self.ort_session.run(None, ort_inputs)
scores = softmax(ort_outs[0])[0]
predictions = [{"label": label, "score": score} for score, label in zip(scores, self.lables)]

Here are the codes:

GitHub - mzeynali/mlops-hands-on-tutorial: Basic MLOps Hands On Tutorial

Basic MLOps Hands On Tutorial. Contribute to mzeynali/mlops-hands-on-tutorial development by creating an account on…

github.com

Conclusion

Converting models to ONNX and running them with ONNX Runtime provides a robust way to ensure your machine learning models are portable, optimized, and ready for deployment across various platforms. The scripts provided, convert_model_to_onnx.py and inference_onnx.py, offer a practical example of how to convert a PyTorch model to ONNX and run inference using ONNX Runtime, demonstrating the versatility and power of the ONNX ecosystem.

By adopting ONNX, developers can future-proof their models, ensuring they are compatible with the latest hardware accelerations and deployment environments. Whether you’re working in a research environment or deploying at scale, ONNX is a valuable tool in your machine learning toolkit.