OpenVINO- The Inference Engine

10 min readFeb 24, 2020

如果你對 IR, Model Optimizer 詞彙陌生，應該先讀這一篇

Inference 人工智慧的詞彙裡，稱為「推論」，是指經過訓練的神經網路會按照訓練過的內容，猜想新資料的走向。

OpenVINO 的 Inference Engine 是依著 Model Optimizer 給出的訓練好的神經網路而進行 inference. OpenVINO Model Optimizer 會將 model 最佳化，而 Inference Engine 會用硬體將推論最佳化 (hardware-based optimizations)。

Inference Engine 透過 C++ 或是 Python 的 API library 和 User Application 做界接。參考文件

支援裝置:

參考文件

Intel 的硬體 CPUs, GPUs, FPGAs, and VPUs. 支援的格式有:FP32 FP16 U8 U16 I8 I16

載入 Model 到 Inference Engine

Inference Engine (簡稱IE) 要載入 model (IR) 最常使用的兩個 calss 為:
(1) IECore: 和 Inference Engine 工作的 class。文件

(2) IENetwork: 訓練模型的network，被用來載入 IECore.。

起始一個 IECore 不須一任何的參數；起始一個 IENetwork 在需要載入 model 和 weights。(使用IR層裡面的 xml 和 binary 的檔案)

範例:

假設我們編輯好一個 python - feed_network.py 要使用 /home/workspace/models/human-pose-estimation-0001.xml 這個模型

feed_network.py

### 導入library
import os
from openvino.inference_engine import IENetwork, IECore

Load IR model

def load_to_IE(model_xml):
    ### Load the Inference Engine API
    plugin = IECore()

    ### Load IR files into their related class
    model_bin = os.path.splitext(model_xml)[0] + ".bin"
    net = IENetwork(model=model_xml, weights=model_bin)

    ### Add a CPU extension, if applicable.
    plugin.add_extension(CPU_EXTENSION, "CPU")

    ### Get the supported layers of the network
    supported_layers = plugin.query_network(network=net, device_name="CPU")

    ### Check for any unsupported layers, and let the user
    ### know if anything is missing. Exit the program, if so.
    unsupported_layers = [l for l in net.layers.keys() if l not in supported_layers]
    if len(unsupported_layers) != 0:
        print("Unsupported layers found: {}".format(unsupported_layers))
        print("Check whether extensions are available to add to IECore.")
        exit(1)

    ### Load the network into the Inference Engine
    plugin.load_network(net, "CPU")

    print("IR successfully loaded into Inference Engine.")

    return

傳送Request 到 Inference Engine

當 IENetwork 被送到 IECore之後，則 Inference Engine 產生了 ExecutableNetwork。這個 ExecutableNetwork 就是處理 Request 的 library.

Request 分兩種，一種是同步 synchronous 另一種非同步 asynchronous。在 ExecutableNetwork, request 都為 InferRequest 物件。Synchronous Request 可直接使用 ExecutableNetwork 的 infer 功能，Asynchronous Request 需使用 start_async 後，使用 wait 讓推論的工作完成。

參考文件

範例:

import argparse
import cv2
from helpers import load_to_IE, preprocessingCPU_EXTENSION = "/opt/intel/openvino/deployment_tools/inference_engine/lib/intel64/libcpu_extension_sse4.so"def get_args():
    '''
    Gets the arguments from the command line.
    '''
    parser = argparse.ArgumentParser("Load an IR into the Inference Engine")
    # -- Create the descriptions for the commands
    m_desc = "The location of the model XML file"
    i_desc = "The location of the image input"
    r_desc = "The type of inference request: Async ('A') or Sync ('S')"# -- Create the arguments
    parser.add_argument("-m", help=m_desc)
    parser.add_argument("-i", help=i_desc)
    parser.add_argument("-r", help=i_desc)
    args = parser.parse_args()return argsdef async_inference(exec_net, input_blob, image):
    ### Add code to perform asynchronous inference
    ### Note: Return the exec_net
    exec_net.start_async(request_id=0, input={input_blob:image})
    while True:
        status = exec_net.requests[0].wait(-1)
        if status == 0:
            break
        else:
            time.sleep(1)
    return exec_netdef sync_inference(exec_net, input_blob, image):
    ### Add code to perform synchronous inference
    ### Note: Return the result of inference
    result = exec_net.infer({input_blob: image})
    
    return resultdef perform_inference(exec_net, request_type, input_image, input_shape):
    '''
    Performs inference on an input image, given an ExecutableNetwork
    '''
    # Get input image
    image = cv2.imread(input_image)
    # Extract the input shape
    n, c, h, w = input_shape
    # Preprocess it (applies for the IRs from the Pre-Trained Models lesson)
    preprocessed_image = preprocessing(image, h, w)# Get the input blob for the inference request
    input_blob = next(iter(exec_net.inputs))# Perform either synchronous or asynchronous inference
    request_type = request_type.lower()
    if request_type == 'a':
        output = async_inference(exec_net, input_blob, preprocessed_image)
    elif request_type == 's':
        output = sync_inference(exec_net, input_blob, preprocessed_image)
    else:
        print("Unknown inference request type, should be 'A' or 'S'.")
        exit(1)# Return the exec_net for testing purposes
    return outputdef main():
    args = get_args()
    exec_net, input_shape = load_to_IE(args.m, CPU_EXTENSION)
    perform_inference(exec_net, args.r, args.i, input_shape)if __name__ == "__main__":
    main()

處理 Request

上一段說明 request 會由 ExecutableNetwork 來處理，而且 request 是 InferRequest object. 參考文件裡面說明，InferRequest 物件有三個 attributes — inputs, outputs 和 latency (執行推論的時間)，由上面的例子，若是要print output 的資訊，可以由以下這樣的語句得到:

exec_net.requests[request_id].outputs[output_blob]

ExecutableNetwork 裡的 InferRequest 物件，可使用 request_id 來得到該 InferRequest 物件。也可使用 outputs attribute 來獲取該 request id 推論後的結果。

和 Application 的整合

所以一個完善的 OpenVINO Edge Application 的運作如下:

使用 Model Optimizer 所產生出來的 IR
將 IR 載入 application 裡的 inference engine
加入 input 的預處理。例如:使用 CV2 處理圖像
傳送 Inference request
處理推論後的 output

一個簡單的 application 結構大約會是:

app.py: input讀取以及預處理、產生Network並使用推論(inference)得到 result，以及result的處理

inference.py: 模型的載入、同步或是非同步的request 處理及output的處理

model.bin / model.xml (Model Optimization果後產生的兩個檔案

參考文件: OpenVINO 業界應用