Importing PyTorch and TensorFlow Models Into OpenVINO

Published in

OpenVINO-toolkit

11 min readNov 30, 2023

Author: Ryan Loney, Product Manager, OpenVINO™ Toolkit

Executive Summary

It is simple to import PyTorch and TensorFlow models into OpenVINO with only a few lines of code. Models that have been developed in PyTorch or TensorFlow can easily be integrated into an OpenVINO inference pipeline for best-in-class performance on a wide variety of devices. Developers no longer have to manage model exports or conversion scripts: instead, models can be loaded and deployed from their native format.

The examples below show how TensorFlow and PyTorch models can be easily loaded in OpenVINO. The models are loaded, converted to OpenVINO format, and compiled for inferencing in just several lines of code.

**Table 1: Basic Methods for Importing TensorFlow or PyTorch Models in OpenVINO**

While the above examples provide simple options to import models into OpenVINO, there are other options to provide more customization and flexibility. The rest of this Solution Brief describes the various options explains when to use each option, and provides code snippets showing how to use them.

Working With Models in OpenVINO

Model States in OpenVINO

There are three states a model in OpenVINO can be: saved on disk, loaded but not compiled (ov.Model), or loaded and compiled (ov.CompiledModel).

Saved on disk
- As the name suggests, a model in this state consists of one or more files that fully represent the neural network. As OpenVINO not only supports its proprietary format but also other frameworks, how a model is stored can vary. For example:

✧ OpenVINO IR: pair of .xml and .bin files

✧ONNX: .onnx file

✧ TensorFlow: directory with a .pb file and two subfolders or just a .pb file

✧ TensorFlow Lite: .tflite file

✧ PaddlePaddle: .pdmodel file

Loaded but not compiled
- In this state, a model object (ov.Model) is created in memory either by
parsing a file or converting an existing framework object. Inference cannot be done with this object yet as it is not attached to any specific device, but it allows customization such as reshaping its input, applying quantization, or even adding preprocessing steps.
Loaded and compiled
- This state is achieved when one or more devices are specified for a model object to run on (ov.CompiledModel), allowing device optimizations to be made and enabling inference.

Functions for Reading, Converting, and Saving Models in OpenVINO

OpenVINO provides several functions to work with models:

read_model
- Creates an ov.Model from a file.
- File formats supported: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow, and TensorFlow Lite. PyTorch files are not directly supported.
- OpenVINO files are read directly while other formats are converted automatically.
compile_model
- Creates an ov.CompiledModel from a file or ov.Model object.
- File formats supported: OpenVINO IR, ONNX, PaddlePaddle, TensorFlow, and TensorFlow Lite. PyTorch files are not directly supported.
- OpenVINO files are read directly while other formats are converted automatically.
convert_model
- Creates an ov.Model from a file or Python memory object.
- File formats supported: ONNX, PaddlePaddle, TensorFlow, and TensorFlow Lite.
- Framework objects supported: PaddlePaddle, TensorFlow, and PyTorch.
- This method is only available in the Python API.
save_model
- Saves an ov.Model to OpenVINO IR format.
- Compresses weights to FP16 by default.
- This method is only available in the Python API.

The code snippets in this solution brief show examples of how to use these functions. See the ‘Where to Learn More’ section for more information on each function.

Although this guide focuses on the different ways to get TensorFlow and PyTorch models running in OpenVINO, using them repeatedly may not be the best option performance-wise. Rather than use the framework files or Python objects directly each time, a better option would be to import the model into OpenVINO once, customize it as needed, and then save it to OpenVINO IR with save_model. Then, the saved model can be read as needed with read_model avoiding the extra conversions. Check the Further Improvements section for other reasons to use OpenVINO IR.

Also note that even though files from frameworks such as TensorFlow can be used directly, that does not mean OpenVINO uses those frameworks behind the scenes, files and objects are always converted to a format OpenVINO understands, i.e. OpenVINO IR.

TensorFlow Import Options

OpenVINO’s direct support of TensorFlow allows developers to use their models in an OpenVINO inference pipeline without changes. However, as multiple ways of doing this exist, it may not be clear which is the best approach for a given situation. The following diagram aims to simplify this decision given a certain context, although some additional considerations should be taken into account depending on the use case. See ‘Other Considerations’ for more details.

Method 1. Convert using ov.convert_model function (Python only)

As seen above, if your starting point is a Python object in memory, for example, a tf.keras.Model or tf.Module, a direct way to get the model in OpenVINO is to use ov.convert_model. This method produces an ov.Model (one of the three states) that can later be reshaped, saved to OpenVINO IR, or compiled to do inference. In code it may look as follows:

import openvino as ov
import tensorflow as tf

# 1a. Convert model created with TF code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
ov_model = ov.convert_model(model)

# 1b. Convert model from file
ov_model = ov.convert_model("model.pb")


# 2. Compile model from memory
core = ov.Core()
compiled_model = core.compile_model(ov_model)

Method 2. Convert from file using ov.compile_model function

In case you are starting with a file, you will need to see if the model is fine as is or if it needs to be customized, such as applying quantization or reshaping its inputs.

If the model does not need to be customized, ov.Core.compile_model should be used, which reads, converts (if needed), and compiles the model, leaving it ready for inference all in one go. The code should look like this:

import openvino as ov

# 1. Compile model from file
core = ov.Core()
compiled_model = core.compile_model("model.pb")

Method 3. Convert from file using ov.read_model function

If the model does need to be customized, ov.read_model can be used as it just returns an ov.Model ready to be quantized or have its inputs reshaped. (Note: This method also works with the OpenVINO C++ API, so it is useful for developers working in a C++ environment.)

import openvino as ov

# 1. Convert model from file
core = ov.Core()
ov_model = ov.read_model("model.pb")

# 2. Compile model from memory
compiled_model = core.compile_model(ov_model)

Method 4. Convert from a file using OpenVINO Model Converter (ovc CLI)

However, if the input reshaping is known in advance and/or the model has multiple outputs but only some of them are required, OpenVINO provides two equivalent ways of doing these while converting the model. One of them is the CLI command ovc while the other is the previously mentioned ov.convert_model (discussed in Method 1).

The ovc tool is similar to ov.convert_model, except it works using the command line rather than a Python environment. It will convert the model to OpenVINO IR format, apply any configurations you specify, and save the converted model to disk. It is useful if you are not working with your model in Python (e.g. if you are developing in a C++ environment) or if you prefer using the command line rather than a Python script.

The code below shows how to convert a model with ovc and then load it for inference:

# 1. Convert model from file
ovc model.pb

import openvino as ov

# 2. Load model from file
core = ov.Core()
ov_model = core.read_model("model.xml")

# 3. Compile model from memory
compiled_model = core.compile_model(ov_model)

Supported Model Formats

With its different functions, OpenVINO supports loading models from files and Python objects, allowing several TensorFlow formats from both 1.X and 2.X to be used.

In TensorFlow 2.X, models are typically saved in SavedModel format, which contains checkpoints and training information for the model. Two other formats are also supported, the older Keras H5 (.h5) and the newer Keras v3 (.keras). In contrast, TensorFlow 1.X models are usually exported as frozen graphs although non-frozen formats such as SavedModel and MetaGraph are used.

Thus, OpenVINO support for TensorFlow models is as follows:

Files
- SavedModel — <SAVED_MODEL_DIRECTORY> or <INPUT_MODEL>.pb
- Checkpoint — <INFERENCE_GRAPH>.pb or <INFERENCE_GRAPH>.pbtxt
- MetaGraph — <INPUT_META_GRAPH>.meta
Python objects
- tf.keras.Model
- tf.keras.layers.Layer
- tf.Module
- tf.function
- tf.compat.v1.Graph
- tf.compat.v1.GraphDef
- tf.compat.v1.session
- tf.train.checkpoint

Note that neither TensorFlow 2.X Keras H5 and Keras v3 formats are directly supported but instructions are still provided for the former.

OpenVINO also supports TensorFlow Lite files, see ‘ONNX, PaddlePaddle, and TensorFlow Lite Import Options’ for more information.

PyTorch Import Options

OpenVINO’s direct support of PyTorch allows developers to use their models in an OpenVINO inference pipeline without changes. OpenVINO provides multiple ways of using PyTorch, so it may not be clear which is the best approach for a given situation. The following diagram aims to simplify this decision given a certain context, although some additional considerations should be taken into account depending on the use case. See ‘Other Considerations’ for more details.

PyTorch models can be imported into OpenVINO directly from a Python object, although saved PyTorch files can be used as well. To use a saved PyTorch file, it needs to be loaded in PyTorch first to convert it to a Python object.

Once the model is loaded as a PyTorch Python object, you can decide whether to start using the OpenVINO framework and its features directly or to remain within the PyTorch framework while leveraging OpenVINO’s optimizations.

Method 1. Convert using the ov.convert_model function

If OpenVINO is preferred, ov.convert_model is the method to use. It produces an ov.Model (one of the 3 states) that can later be reshaped, saved to OpenVINO IR, or compiled to do inference. In code it may look as follows:

import openvino as ov
import torch
from torchvision.models import resnet50

# 1a. Convert model created with PyTorch code
model = resnet50(weights="DEFAULT")
model.eval()
ov_model = ov.convert_model(model, example_input=torch.rand(1, 3, 224, 224))

# 1b. Convert model loaded from PyTorch file
model = torch.load("model.pt")
model.eval()
ov_model = ov.convert_model(model)

# 2. Compile model from memory
core = ov.Core()
compiled_model = core.compile_model(ov_model)

Note that the need to set example_input depends on the model used. However, it is recommended to always set it if available as it usually leads to a better quality model. For more details, check out the docs.

Method 2. Use OpenVINO backend in PyTorch

In case PyTorch syntax is preferred, since PyTorch 2.0 and OpenVINO 2023.1, a PyTorch model can be optimized with OpenVINO by specifying it as a backend in torch.compile.

import openvino.torch
import torch
from torchvision.models import resnet50

# 1a. Compile model created with PyTorch code
model = resnet50(weights="DEFAULT")
model.eval()
compiled_model = torch.compile(model, backend="openvino")

# 1b. Compile model loaded from PyTorch file
model = torch.load("model.pt")
model.eval()
compiled_model = torch.compile(model, backend="openvino")

Method 3. Export the model to ONNX and use one of OpenVINO’s methods

If none of these two methods convert the model successfully, there is a third method that once was the main way of using PyTorch in OpenVINO, but now is mainly considered a backup plan. This method consists of exporting a PyTorch model to ONNX and then loading it with the different methods available in OpenVINO. See ‘ONNX, PaddlePaddle, and TensorFlow Lite Import Options’ for more details.

import torch
import openvino as ov
from torchvision.models import resnet50

# 1. Export PyTorch model to ONNX
model = resnet50(weights="DEFAULT")
model.eval()

dummy_input = torch.randn(1,3,224,224)
torch.onnx.export(model, dummy_input, "model.onnx")

# 2. Use an OpenVINO method to read and compile it, for example, compile_model
core = ov.Core()
compiled_model = core.compile_model("model.onnx")

Supported Model Formats

As PyTorch does not have a save format that contains everything needed to reproduce the model without using torch, OpenVINO only supports loading Python objects directly. The support is as follows:

Python objects
- torch.nn.Module
- torch.jit.ScriptModule
- torch.jit.ScriptFunction

ONNX, PaddlePaddle, and TensorFlow Lite Import Options

TensorFlow and PyTorch are not the only frameworks supported by OpenVINO; it also supports ONNX, PaddlePaddle, and TensorFlow Lite. The purpose of this section is to briefly mention how they can be imported into OpenVINO.

ONNX, PaddlePaddle, and TensorFlow Lite files have the same support as TensorFlow files, i.e. all file methods described in ‘TensorFlow Import Options; work for them. The only one that also seems to support Python objects is PaddlePaddle.

The complete support for all frameworks is as follows:

ONNX
- Files
<input_model>.onnx
PaddlePaddle
- Files
<input_model>.pdmodel
Python objects:
- paddle.hapi.model.Model
- paddle.fluid.dygraph.layers.Layer
- paddle.fluid.executor.Executor
TensorFlow Lite
- Files
<input_model>.tflite

Further Improvements

As seen through the solution brief, there are several ways of getting a framework model into OpenVINO. However, having to convert the model each time impacts performance. Thus, for most use cases it is usually better to convert the model once and then use OpenVINO’s own format, OpenVINO IR, directly. Some of the reasons to use OpenVINO IR are listed below.

Saving to IR to improve first inference latency

When first inference latency matters, rather than convert the framework model each time it is loaded, which may take some time depending on its size, it is better to do it once, save the model as an OpenVINO IR with save_model and then load it with read_model as needed. This should improve the time it takes the model to make the first inference as it avoids the conversion step.

Saving to IR in FP16 to save space

Another reason to save in OpenVINO IR may be to save storage space, even more so if FP16 is used as it may cut the size by about 50%, especially useful for large models like Llama2–7B.

Saving to IR to avoid large dependencies in inference code

One more consideration is that to convert Python objects the original framework is required in the environment. Frameworks such as TensorFlow and PyTorch tend to be large dependencies (multiple gigabytes), and not all inference environments have enough space to hold them. Converting models to OpenVINO IR allows them to be used in an environment where OpenVINO is the only dependency, so much less disk space is needed. Another benefit is that loading and compiling with OpenVINO directly usually takes less runtime memory than loading the model in the source framework and then converting and compiling it.

An example showing how to take advantage of OpenVINO IR is shown below:

# Run once
import openvino as ov
import tensorflow as tf

# 1. Convert model created with TF code
model = tf.keras.applications.resnet50.ResNet50(weights="imagenet")
ov_model = ov.convert_model(model)

# 2. Save model as OpenVINO IR
ov.save_model(ov_model, 'model.xml', compress_to_fp16=True) # enabled by default

# Repeat as needed
import openvino as ov

# 3. Load model from file
core = ov.Core()
ov_model = core.read_model("model.xml")

# 4. Compile model from memory
compiled_model = core.compile_model(ov_model)

Save a model in OpenVINO IR once, and use it many times!

Where to Learn More

To learn more about how models can be imported in OpenVINO, visit their documentation page on the OpenVINO website. Take a look as well to the PyTorch and TensorFlow sections for specifics about them.

OpenVINO also provides example notebooks for both frameworks showing how to load a model and make inferences. The notebooks can be downloaded and run on a development machine where OpenVINO has been installed. Visit the notebooks at these links: PyTorch, TensorFlow.

To learn more about the OpenVINO toolkit and how to use it to build optimized deep-learning applications, visit the Get Started page. OpenVINO also provides several example notebooks showing how to use it for basic applications like object detection and speech recognition on the Tutorials page.

Notices & Disclaimers

Intel technologies may require enabled hardware, software, or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Importing PyTorch and TensorFlow Models Into OpenVINO

Executive Summary

Working With Models in OpenVINO

Model States in OpenVINO

Functions for Reading, Converting, and Saving Models in OpenVINO

TensorFlow Import Options

Method 1. Convert using ov.convert_model function (Python only)

Method 2. Convert from file using ov.compile_model function

Method 3. Convert from file using ov.read_model function

Method 4. Convert from a file using OpenVINO Model Converter (ovc CLI)

Supported Model Formats

PyTorch Import Options

Method 1. Convert using the ov.convert_model function

Method 2. Use OpenVINO backend in PyTorch

Method 3. Export the model to ONNX and use one of OpenVINO’s methods

Supported Model Formats

ONNX, PaddlePaddle, and TensorFlow Lite Import Options

Further Improvements

Saving to IR to improve first inference latency

Saving to IR in FP16 to save space

Saving to IR to avoid large dependencies in inference code

Where to Learn More

Notices & Disclaimers

Written by OpenVINO™ toolkit