Scoping Edge Compute for AI Deployments with Intel® OpenVINO™

Published in

OpenVINO-toolkit

7 min readSep 20, 2022

Authors: Ravi Panchumarthy, Pradeep Sakhamoori.

1. Introduction

Artificial Intelligence (AI) is becoming increasingly popular across industries. In fact, according to Gartner’s Hype Cycle for AI, over half of the organizations have already deployed AI solutions. Based on the business’s unique requirements, there are three major types of AI deployment models:

Edge: Deployed on-premises, where the data is processed in the same location as the AI compute.
Cloud: Data is sent to the cloud for processing and accessed via API calls.
Hybrid: Combination of both edge and cloud.

AI deployment metrics help businesses to evaluate the effectiveness of their AI deployments and measure the ROI of their AI solutions. AI deployment metrics can be classified into two categories:

Operational Key Performance Indicators (KPI): Measures the efficiency of the AI solution (accuracy, speed, latency, etc.).
Business Performance Indicators (BPI): Measures the success of the AI solution. BPI measures the impact of AI solutions on business processes (customer satisfaction, sales conversion rate, etc.).

Scoping edge compute for AI deployments to satisfy the operational KPI’s is one of the challenges that businesses face while deploying AI solutions. In this article, we share methodologies and tools required to estimate the required edge compute to satisfy the operational KPI metrics for AI solution deployments at Edge with an example use case — Frictionless shopping. We provide a script that helps in determining the right compute for deployment or identifying the operational KPIs for an existing deployment compute hardware. The script will output the OpenVINO™ Model Server config parameter — CPU_THROUGHPUT_STREAMS.

2. Example Use-Case: Frictionless shopping in the Retail Industry

Consider a retail store planning to set up point-of-sale (POS) systems for frictionless shopping. The customer will place the product(s) (one or more at a given instance) at the checkout station and video frames from the overhead camera are sent to an edge compute for AI processing (product detection and product classification). See Fig 1. for a visual representation of POS at retail stores and Edge computer processing video frames from multiple POS systems.

Fig 1 (a): Frictionless shopping setup with multiple clients (POSs with Cameras)

Fig 1 (b): Deployment scenario ambiguities

For the above-explained use case of frictionless shopping, one of the following scenarios needs to be addressed for AI deployment:

Scenario 1: Estimating optimal latency for a fixed number of POS(s) (clients/cameras).
Scenario 2: Estimating max POS(s) (clients/cameras) for a given latency and range of POS(s).
Scenario 3: Estimating optimal latency and number of POS(s) (clients/cameras) for a given edge compute.

For any scenario, below are the following execution steps, see Fig 2.

Step 1: Download the desired OpenVINO™ model(s).
Step 2: Make necessary changes in the params.cfg file. Requires setting paths to the model, setting the appropriate mode for the desired scenario. NOTE: See OVMS conventions for the model directory structure.
Step 3: Run the script.
Step 4: Review the results.

Fig 2: Proposed execution pipeline for OVMS benchmarking.

3. Required Tools

Before we address the above scenarios, let’s take a look at the required software tools.

3.1 OpenVINO™ Model Server (OVMS):

OpenVINO™ Model Server (OVMS) is a high-performance system for serving machine learning models. It is based on C++ for high scalability and optimized for Intel® solutions so that you can take advantage of all the power of the Intel® Xeon® processor or Intel’s AI accelerators and expose it over a network interface. OVMS uses the same architecture and API as TensorFlow Serving, while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making it easy to deploy new algorithms and AI experiments.

For more details, please refer to:

OVMS: https://docs.openvino.ai/latest/ovms_what_is_openvino_model_server.html
OpenVINO™: https://docs.openvino.ai/latest/index.html
Model Repository: https://docs.openvino.ai/2022.1/ovms_docs_models_repository.html

3.2 OpenVINO™ Model Server Benchmark Client App:

OpenVINO™ Model Server Benchmark client uses asynchronous gRPC API and tests performance with synthetic data. Prior to transmission, the client downloads metadata from the server, which contains a list of available models, their versions as well as accepted input and output shapes. Then it generates tensors containing random data with shapes matched to the models served by the service. Both the length of the dataset and the workload duration can be specified independently. The synthetic data created is then served in a loop iterating over the dataset until the workload length is satisfied. As the main role of the client is performance measurement all aspects unrelated to throughput and/or latency are ignored. This means the client does not validate the received responses nor does it estimate accuracy as these activities would negatively affect the measured performance metrics on the client side.

For more details on OpenVINO™ Model Server Benchmark Client (Python), see

4. AI Deployment Scenarios

Now, let’s look at how to address the above-mentioned scenarios:

Scenario 1: Estimating optimal latency for a fixed number of POS(s) (clients/cameras):

In this deployment scenario, the enterprise would want to estimate optimal latency for a particular number of cameras (POSs) that can be supported. See section 5 for details.

Input:

OpenVINO™ model(s)
KPI Number of cameras (POSs)

Output:

Min latency, OMVS Config params: CPU_THROUGHPUT_STREAMS

Scenario 2: Estimating max POS(s) (clients/cameras) for a given latency and range of POS(s):

Let’s consider a case where a data scientist trained an AI model for a frictionless shopping use case, which meets the enterprise KPIs (latency). Now, the next step is to determine for the given latency how many POSs (cameras) can be supported for various hardware (edge compute).

Input:

OpenVINO™ model(s)
KPI Latency
(Min-Max Cameras)

Output:

Max Cameras, OMVS Config params: CPU_THROUGHPUT_STREAMS

Scenario 3: Estimating optimal latency and number of POS(s) (clients/cameras) for a given edge compute:

In this deployment scenario consider we have a pre-selected/existing compute, and the enterprise would want to estimate the best latency (or max POS(s) client cameras) that can be achieved. See section 5 for details.

Input:

OpenVINO™ Model(s)

Output:

Min latency, OMVS Config params: CPU_THROUGHPUT_STREAMS
Several combinations of Streams, Concurrency, Latency(ms)

5. Prerequisites and Setup:

The following are the prerequisites and setup needed:

1. Docker installed.
2. OpenVINO™ Model Server Docker Image.
3. OpenVINO™ Benchmark Client (Python) Docker Image.
4. Shell Script and config file for evaluating the scenarios.

For (2), (3), we provide a script for preparing the required Docker images. See Appendix A for the script: ovms_benchmark_setup.sh

For (4), See Appendix A for the script: run_ovms_benchmark.sh, and config file: params.cfg

Assumptions:

In the following sample execution runs, we assume network latency is zero as we run both the OVMS and the OVMS benchmark client on the same hardware.
The model is in OpenVINO™ IR format.

For each scenario, params.cfg need to be updated accordingly.

Scenario 1: Estimating optimal latency for a fixed number of POS(s) (clients/cameras):

For scenario 1, Set the following variables in params.cfg :
- model_paths_list=(“/path/to/model1”)
- kpi_concurrency=10 # Number of cameras

Scenario 2: Estimating max POS(s) (clients/cameras) for a given latency and range of POS(s):

For scenario 2, Set the following variables in params.cfg :
- model_paths_list=(“/path/to/model1”)
- kpi_latency=5.5 # Desired latency in ms
- min_concurrency=5 # min number of cameras
- max_concurrency=12 # max number of cameras

Scenario 3: Estimating optimal latency and number of POS(s) (clients/cameras) for a given edge compute:

For scenario 3, Set the following variables in params.cfg :
- model_paths_list=(“/path/to/model1”)

6. End-to-End Sample Execution Steps

In this section, we will showcase end-to-end sample execution for scenario1 with OpenVINO™ SSDMobileNetV2 for Object Detection.

Step 1: Run the setup script for OVMS and OVMS Benchmark Client Docker images.

bash ovms_benchmark_setup.sh

Step 2: Download and prepare the OpenVINO™ Product Detection model (SSDMobileNetV2).

mkdir -p workspace/product-detection-001/1cd workspace/product-detection-001/1wget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/3/product-detection-0001/FP32/product-detection-0001.xmlwget https://storage.openvinotoolkit.org/repositories/open_model_zoo/2022.1/models_bin/3/product-detection-0001/FP32/product-detection-0001.bin

Step 3: Configure parameters in params.cfg. Here we

model_paths_list=(“workspace/product-detection-001”)kpi_concurrency=10

Step 4: Run the script

bash run_ovms_benchmark.sh

Step 5: Review the results. Below is the sample script output..

***** Completed model: workspace/product-detection-001/ *****Metrics saved at ovms_bm_logs/2022–09–12–08–20–13/product-detection-0001/metrics.csvOverall Best Result within this run:Best Latency: 0.02608031788464254 ms with 10 Concurrency/cameras, with OVMS NSTREAMS: 8However, for complete results and other possibilities, see ovms_bm_logs/2022–09–12–08–20–13/product-detection-0001/metrics.csv***** Script execution completed *****Total number of benchmark runs completed: 9Log folder root: ovms_bm_logs/2022–09–12–08–20–13Summary: ovms_bm_logs/2022–09–12–08–20–13/summary.log

Hope these scripts will assist you in your AI deployments. We plan to add more features in our next release.

Notices & Disclaimers
Intel technologies may require enabled hardware, software or service activation.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Appendix A

Below are 3 files supporting this blog.

Scoping Edge Compute for AI Deployments with Intel® OpenVINO™

1. Introduction

2. Example Use-Case: Frictionless shopping in the Retail Industry

3. Required Tools

4. AI Deployment Scenarios

Scenario 1: Estimating optimal latency for a fixed number of POS(s) (clients/cameras):

Scenario 2: Estimating max POS(s) (clients/cameras) for a given latency and range of POS(s):

Scenario 3: Estimating optimal latency and number of POS(s) (clients/cameras) for a given edge compute:

5. Prerequisites and Setup:

Scenario 1: Estimating optimal latency for a fixed number of POS(s) (clients/cameras):

Scenario 2: Estimating max POS(s) (clients/cameras) for a given latency and range of POS(s):

Scenario 3: Estimating optimal latency and number of POS(s) (clients/cameras) for a given edge compute:

6. End-to-End Sample Execution Steps

Appendix A

Written by OpenVINO™ toolkit