Have you trained a model in the cloud? Use OpenVINO™ to maximize performance!

OpenVINO™ toolkit

Published in

OpenVINO-toolkit

5 min readDec 13, 2022

Authors: Pradeep Sakhamoori, Ravi Panchumarthy.

1. Introduction:

In recent years, deep learning has become a powerful tool for a wide range of applications, including image and speech recognition, natural language processing, and more. As the capabilities of deep learning models continue to advance, there is an increasing need to optimize these models for deployment on edge devices, such as cameras, drones, and robots.

Optimizing deep learning models for deployment using tools like OpenVINO™ is important because it can improve the performance of the models and make them more efficient on different hardware platforms. When a deep learning model is optimized for deployment, the computational resources on target hardware platforms are used efficiently, resulting in optimal performance. This can be particularly important when deploying deep learning models on edge devices, where the computational resources are limited, and the performance of the model is critical.

There are several cloud-based platforms that offer a range of services for training deep learning models. These services typically include access to compute power for training the models, storage services for storing data and models, and end-to-end platforms for building, training, and deploying machine learning models. Examples of such platforms include Amazon SageMaker, Google Cloud AI Platform, Microsoft Azure Machine Learning, IBM Watson Studio, etc.

Once you have chosen a cloud platform and trained a model, OpenVINO™ can be used to optimize the model for deployment on a variety of hardware platforms. In this blog, we will give an overview of how to use OpenVINO to optimize the inference performance of a trained deep learning model. By end of this blog, you’ll have a solid understanding of how to use OpenVINO to optimize and deploy your deep learning models.

2. Why use OpenVINO™? Support for AI frameworks and Hardware:

Figure 1: OpenVINO™ supported Deep Learning frameworks and hardware.

As shown in Fig.1, OpenVINO™ is a toolkit that helps developers run deep learning algorithms efficiently on Intel hardware. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. OpenVINO™ supports multiple deep learning frameworks, including TensorFlow, PyTorch, ONNX, MxNET, and Caffe, among others.

OpenVINO™ is optimized to run deep learning inference on a variety of Intel hardware, including CPUs, GPUs, VPUs and FPGAs. By using OpenVINO, developers can take advantage of the performance and power efficiency of these Intel hardware platforms to run their deep learning models faster and more efficiently.

OpenVINO™’s “write once deploy anywhere” philosophy enables developers to write their deep learning inference code once using OpenVINO, and then deploy it on a variety of different hardware platforms without having to make any changes to their code. This allows developers to easily optimize their models for a specific hardware platform, without having to write custom code for each platform. This makes it easy to deploy deep learning models on a wide range of devices, from edge devices with low-power CPUs to high-performance servers with powerful GPUs.

3. Optimizing and Deploying OpenVINO™ model:

Optimizing a deep learning model for a specific hardware platform using OpenVINO involves a few steps. First, use the OpenVINO™ Model Optimizer to convert the model to the Intermediate Representation (IR) format, a common format used by OpenVINO to represent deep learning models.

Figure 2: OpenVINO™ model conversion and inference workflow.

Once the model is in the IR format, it can be optimized for the specific hardware platform using the OpenVINO™ inference engine. This process involves tuning the model for the characteristics of the target hardware, such as the number of available CPU cores or the amount of memory on the device.

Finally, the optimized model can be deployed to the desired hardware platform. This can be done by integrating the model with a user application using the OpenVINO™ inference engine. The application can then use the model to perform deep learning inference on input data, taking advantage of the performance and power efficiency of the target hardware platform. Alternatively, one can use OpenVINO™ model server to serve inference requests.

To provide an example, we considered a scenario in which a user is running a Tensorflow training pipeline on AWS Cloud with Amazon Sagemaker. After the desired model training targets have been achieved and the model is ready for deployment, OpenVINO™ can be used to optimize the trained model for the best possible inference performance (throughput/latency) on Intel-powered devices, such as edge devices, cloud instances, IoT devices, or any other Intel-powered platform. The process for using OpenVINO would remain the same on Azure, GCP, or other cloud service providers.

In Fig.3, left block illustrates the training environment and pipeline along with OpenVINO™ model optimizer task to optimize the trained model and push it to AWS S3 bucket. On right, we can see OpenVINO™ model inference workflow inside Ubuntu based docker container on Intel powered device generating inference output. Note: OpenVINO™ model optimization block shown on AWS Sagemaker, can also be performed on inference side of the workflow.