What’s new in OpenVINO™ toolkit 2022.1 for AI developers

Adrian Boguszewski

Published in

OpenVINO-toolkit

12 min readJul 29, 2022

Authors: Qiu Dan and Adrian Boguszewski

Introduction

Developers familiar with OpenVINO™ may know that it is approximately released once a quarter, and the release usually represents the changing scope. Let’s take 2021.3 for example; 2021 is the major version reflecting significant changes such as API, platform, etc.; 3 is the minor version number which implies a relatively small change like minor bug fixes and enhancements. At the beginning of 2022, OpenVINO™ released the new 2022.1 version. With the remarkable evolution from 2021 to 2022, this release brings exciting improvements in performance, ease of use, and new model support for Natural Language Processing (NLP) and Audio. The features and changes closely related to the development work include:

Simplified installation: Streamlined installation packages and runtime libraries
Out of the box: Added a series of features, including Auto-Device Plugin, Performance Hints, and Model Optimizer parameter simplification to help developers get started quickly
Dynamic shape support: Implemented on the CPU at this release
PaddlePaddle integration: Official support for Baidu PaddlePaddle
API Improvements: Introducing a new OpenVINO™ Runtime (inference engine) within API 2.0

Let’s look at these features and changes one by one.

1. Simplified installation package — more concise deployment

During the previous installation of the OpenVINO™ toolkit, the installation script would automatically download several third-party libraries, such as OpenCV, DL Streamer, etc. When the installation is completed, users receive a collection of components such as Model Optimizer, Inference Engine Runtime, OpenCV, DL Streamer, etc. Although that installation method can reduce the configuration workload of the development environment, it also brings some challenges:

The installation directory does not correspond one-to-one with the OpenVINO™ open-source repository directory.
After installation, there are too many tools and add-ons, for example, Open Model Zoo demos, DL Streamer samples, OpenCV tools, etc.
A lot of diverse OpenVINO™ libraries (inference_engine, ngraph, transformations, lp_transforation, frontend_common, etc.) were required to integrate with the application, as well.

Therefore, in the new version 2022.1, the OpenVINO™ team has made significant improvements in these challenges, including the following updates:

Table 1. Summary of changes in the OpenVINO™ installation package

The change from Inference Engine Runtime to OpenVINO™ Runtime mainly refers to the evolution of the module name. Inference Engine was renamed to OpenVINO™ Runtime, which is part of the OV2.0 API strategy for better alignment with frameworks.

Along with the simplified installation package, the primary installation time is shortened, and the occupied space is significantly reduced. Developers must notice that modules such as DL Streamer and OpenCV need to be additionally installed by themselves and are no longer included in the installation package by default.

In addition, OpenVINO™ 2022.1 creates one standard OpenVINO™ runtime library and merges inference engine, ngraph, transformations, lp_transforation, frontend_common libraries to this new library to reduce the number of OpenVINO™ dependencies for customer applications.

2. Out of the box — a more flexible and intelligent way of programming

In the development of OpenVINO™, improving ease of use and achieving an out-of-the-box user experience has always been a fundamental direction. In 2022.1, this orientation is more prominent. We introduce three main aspects to help developers program quickly and simply: performance hints, Auto-Device Plugin, and Model Optimizer out-of-the-box features. The details are as follows:

2.1 Simplification of Model Optimizer conversion parameters

Model Optimizer conversion is the first step of the OpenVINO™ working pipeline, at which developers need to specify a few conversion parameters. It’s relatively difficult for beginners. From statistics of OpenVINO GitHub issues, over 60% of the problems in the issue list are MO-related³. Therefore, OpenVINO™ has started the out-of-the-box story for MO to improve user experience and make it easier for developers to get started. As part of the story, OpenVINO™ 2022.1 introduced support for model conversion, even omitting the disable_nhwc_to_nchw or input_shapes options for those corresponding and applicable models. Besides, some parameters for TensorFlow model conversion as also simplified.

2.2 Performance hints

Performance hints are generally designed to give developers more friendly programming guidance to help set or get performance-related parameters. Usually, people are sensitive to the performance indicators of the application, such as latency and throughput, which are easy to understand and often used as the optimization goal. By contrast, most people are unfamiliar with hardware-related configuration parameters, such as the number of CPU cores, the number of parallel processing streams, etc., which are relatively abstract and challenging to understand. So letting developers focus on accessible latency/throughput data instead of spending too much time studying hardware and concepts is more friendly, especially for beginners. That’s why the “Performance hints” feature came.

With the performance hints feature, you don’t need to care about hardware-related configuration parameters. You only need to set the performance target of latency or throughput, then OpenVINO™ automatically sets a series of optimization parameters to ensure that the latency or throughput target can successfully arrive.

For more details, you can go to the open-source repository of OpenVINO™ to view the documentation of the benchmark tool. Look at the description of the -hint parameter, which is newly added to the benchmark tool:

The -hint parameter here is the specific application of the performance hints functions in the benchmark tool. The performance hints function in OpenVINO™ 2022.1 supports CPU and GPU devices and also supports Auto-Device Plugin. This function is still in development, and you will find more surprising features in a future release.

2.3 Enhanced Auto-Device Plugin

Auto-Device was first introduced in OpenVINO™ 2021.4 and is a new unique “virtual” or “proxy” device in the OpenVINO™ toolkit to intelligently choose the best device to infer. For example, if the device name is specified as AUTO:CPU,GPU, then the CPU and GPU (GPU in this article refer to Intel graphics card, including iGPU and dGPU⁴, the same below) are added to the available list of Auto-Device. When executing ie.load_network(model,"AUTO:CPU,GPU"), the Auto-Device Plugin internally recognizes and selects devices from the available list depending on the device capabilities and the characteristic of models, for example, precisions. Developers can even use the device name AUTO directly. When loading the model without specifying a particular device, the Auto-Device Plugin can intelligently match the hardware platform according to the model.

In OpenVINO™ 2022.1, the Auto-Device plugin releases a few new features. First, AUTO is the default device name for loading models onto accelerators. If the developer does not specify any device during the loading model stage, AUTO will be automatically used as the device name. Besides, Auto-Device Plugin also adds the following main features:

Feature 1: First Inference Latency Optimization

We define the stage from the application startup to the completion of the first inference with the term “First Inference”. The time spent in this stage is referred to as “First Inference Latency”. “First Inference Latency”, also known as application startup time, includes the sum of times including initialization, model loading, and first inference.

Developers of GPU and VPU⁵ may have experienced the delay problem on application startup because the step “compile and load network to device” can potentially perform several time-consuming device-specific optimizations and network compilations. Especially for GPU, the operation of ie.load_network(model,"GPU") will be more labor-intensive than CPU, resulting in a longer delay from program initialization to completion of the first inference.

In OpenVINO™ 2022.1, the Auto-Device plugin adopted a series of strategies to optimize the “First Inference Latency” of GPU and VPU. Let’s take GPU as an example. When developers cannot obtain satisfactory “First Inference Latency” through the GPU plugin, they can change the loaded device name to “AUTO” to significantly improve latency. The method of the Auto-Device plugin is simple and friendly. A lot of optimization work is done internally by the plugin itself, which has to be done manually by human beings previously.

The optimization strategy of “First Inference Latency” requires the participation of the CPU, which will use the CPU plugin to run first-time model loading and inference to achieve a faster startup. At the same time, the model loads to GPU or VPU plugin in the background. Then after the network is loaded successfully, the inference operation is automatically switched onto the GPU or VPU. Therefore, this optimization applies to scenarios where the application startup time is crucial and the CPU has idle computing power.

Feature 2: Full support for performance hints

The performance hints feature, as described above, is fully supported on the Auto-Device plugin.

Feature 3: Dynamic shape and auto-batching

The dynamic shape can be simply explained as that model can receive dynamic inputs and infer with different input sizes. It’s further described in Section 3 of this article named “Dynamic Shape Support”. The Auto-Device plugin supports this feature, as well as the CPU plugin. The support on GPU will be provided in a future release.

Auto-batching is the ability to determine the batch size of inference automatically. For developers on GPU, especially dGPU, choosing an appropriate batch size to utilize GPU’s performance entirely is a critical topic. If the batch size is too small, the GPU performance cannot be fully released. But if the batch size is too large and the memory is insufficient, exceptions such as code crashes will occur. Besides the risk of crashes, different devices often have different optimal batch sizes. Instead of setting the batch size to a fixed value, a more flexible way is required. That’s the background of why auto-batching was designed. Auto-batching started its first step in 2022.1 with limited features opened to developers. For example, developers can enable or disable the auto-batching function with an Auto-Device Plugin configuration. More features will be opened in the future, so just stay tuned.

To sum up, improving ease of use is a super important direction of the development of OpenVINO™, which will continue to help developers get started quickly, including installation, usage parameters, and examples.

3. Dynamic shape support — more comprehensive and extensive model support

This feature should be the most anticipated of the new features in OpenVINO™ 2022.1. The dynamic shape means that some or all of the tensor dimensions may be unknown before inference. Typically, if the shape of a pre-trained model contains -1 or ?, we call it a dynamically shaped model. According to the actual input size, the model’s computational graph will adjust dynamically to predict the result at the inference stage. Mainstream deep learning frameworks, such as Tensorflow and PyTorch, already support this function.

The dynamic shape is supported from version 2022.1, with a phased development and implementation. Currently, this feature is available with the CPU plugin. However, it will be gradually developed into other plugins. Support for this feature is divided into two parts, one is the Model Optimizer changes, and the other is the Runtime changes. See the following analysis.

3.1 Changes in Model Optimizer

In the previous version, the OpenVINO™ Runtime did not support undefined shape parameters like -1 or ?, so the user had to fix the model size by a MO conversion parameter --input_shape. In the new 2022.1 version, the shape does not force it to be static with a fixed value. Developers can omit --input_shape to keep the -1 or ? in the original model or specify --input_shape[1..10,224,224] to set dimension bounds in advance. [1..10,224,224] means the first value, which is usually a batch size will be between 1 and 10 samples. Developers can also observe that the version of the IR file has changed from version 10 of 2021 to version 11 of 2022.

It should be noted that if the parameters related to dynamic shape are enabled in Model Optimizer, the converted model can only be run on the CPU because the dynamic shape is currently supported only in the CPU plugin.

3.2 Changes in Runtime

The concept “Partial shape” is introduced to utilize the dynamic shape in the new Runtime API. An unfixed shape such as [-1,10,224,224] can be represented by a class called “Partial Shape”. For partial shape, please refer to the instructions.

In addition, Old IR with a version less than or equal to 10 does not work with dynamic shapes. OpenVINO™ Runtime can still read and infer Old IR, but the dynamic ability will not be available. Therefore, if you want to use the dynamic shape function, you must use the new MO and Runtime API. The converted IR version number must be 11 also.

4. PaddlePaddle Model Support — a more inclusive ecosystem

OpenVINO™ 2022.1 announces the official support for the PaddlePaddle models. Previously, PaddlePaddle models needed to be converted to ONNX format and go through the ONNX workflow. In 2022.1, ONNX is no longer required as a medium stage. OpenVINO™ directly supports PaddlePaddle by following two paths:

Path 1: Model Optimizer reads the PaddlePaddle model and converts it into an IR file. Then OpenVINO™ runtime reads the IR file for inference.

Path 2: Without the conversion. OpenVINO™ Runtime can directly read the PaddlePaddle model and do inference.

The above paths are the same as in the case of an ONNX model. The above also demonstrates the development concept of OpenVINO™, which strengthens cooperation with other deep learning frameworks, creates a more inclusive and diverse ecosystem, and facilitates developers to access their own models and applications.

In addition to easy integration with PaddlePaddle, OpenVINO™ also focuses on supporting diverse PaddlePaddle models. In version 2022.1, supported models include visual detection and recognition, OCR, and NLP, among others. The range of supported modes and hardware platforms will be further expanded in the future.

5. API improvements — smoother application integration experience

The previous OpenVINO™ API had a series of problems. For example, OpenVINO™ had its own naming rules for tensors, so the tensor in the native framework was called output1, while in OpenVINO™ could be called aaa/bbb/argmax1 or something like that. This was inconsistent with the native framework. Another example may be C++ GetBlob() function returning a pointer to the blob, which cannot be used until it is cast to the MemoryBlob data type, which is very unreasonable. You can find the code in object_detection_sample_ssd / main.cpp .

These are only a part of the problems. To solve these problems, support the dynamic shape feature, and make it easier for developers to migrate applications to OpenVINO™, 2022.1 has carried out API improvements, mainly including:

Introduce the new tensor API to replace the old blob API. A piece of sample code:

Introduce the OpenVINO™ Runtime API to replace the old Inference Engine API. See the sample code:

old hello_reshape_ssd.py, new hello reshape_ssd.py

Introduce a new preprocess module as part of the model processing. The developer only needs to set a few parameters, and OpenVINO™ will handle the data type and format conversion internally. For example:

Introduce a new extension API, see this

From a developer’s perspective, API changes bring extra effort to upgrade the application, which is not welcome. But as I mentioned earlier, these API changes solve the inconsistency with other deep learning frameworks so that applications can be smoothly moved to OpenVINO™. In the long run, it will benefit developers with shorter integration time and better compatibility.

Although the new API was introduced, 2022.1 is still compatible with the old API. The old API will coexist with the new API for a period which is approximately one year to provide developers a sufficient buffer to migrate from the old API to the new API.

Summary

In addition to the above four main features, a lot of other support for ease of use was launched. For example, notebooks are refreshed with a new API so we can study the new API user-friendly. Also, the documentation was simultaneously published with the latest release, and there are dedicated pages for essential features like dynamic shapes.

Welcome to try the new OpenVINO™ 2022.1 release. Please visit the documentation for more information.

Notes:

¹ MO means Model Optimizer. POT stands for Post Optimization Tools, a low-precision quantization tool for models supported by the OpenVINO™ toolkit. DLWB is the abbreviation of Deep Learning Workbench, which is a one-stop graphical platform for model conversion, evaluation, quantification, and tuning. OMZ is the abbreviation of Open Model Zoo, which holds a collection of modes officially supported by OpenVINO™ and corresponding demos with these models.

² One VPL stands for Intel® One API Video Processing Library, a set of programming interfaces for video codec processing.

³ Through statics of issues on GitHub on December 20th, 2021, there were a total of 1067 issues, among which 711 issues were marked as MO, So the MO rating was calculated as 711/1067 = 66.6%.

⁴ iGPU stands for Intel® integrated GPU; dGPU stands for Intel® discrete GPU.

⁵ VPU refers to Intel® Movidius™ Vision Processing Units