Accelerate Big Transfer (BiT) model inference with Intel® OpenVINO™

Pradeep Sakhamoori
OpenVINO-toolkit
Published in
8 min readOct 7, 2022

Authors: Pradeep Sakhamoori, Ravi Panchumarthy, Evgenya Stepyreva, Nico Galoppo.

1. Overview:

The AI community has created many open-source tools and models that help AI developers build AI solutions quickly and efficiently. These open-source AI models can be fine-tuned to fit a user-specific task, which reduces the AI model development cycle time significantly compared to training a model from scratch. After fine-tuning the model and achieving the desired accuracy, the AI model should be deployed to production environments. However, most of these models are not optimized for production deployment. An optimized model should be able to utilize the production environment hardware to achieve the best performance (maximum throughput or minimum latency) in a power-efficient manner. Intel® OpenVINO™ toolkit optimizes the AI model that was trained in a training framework (e.g. TensorFlow, PyTorch, ONNX) for Intel® architectures. In this article, we showcase OpenVINO™ Model Optimizer to optimize the Big Transfer (BiT) model and compare inference performance with TensorFlow and OpenVINO™ on Intel® edge hardware.

2. Big Transfer (BiT):

BiT is a recipe for pre-training image classification models on large, supervised datasets and efficiently fine-tuning them on any given target task. The recipe achieves excellent performance on a wide variety of tasks, even when using very few labeled examples from the target dataset. As explained in the paper, three families of models are presented.

  • “BiT-S”, pre-trained on ImageNet-1k (also known as ILSRCV-2012-CLS);
  • “BiT-M”, pre-trained on ImageNet-21k (also known as the “Full ImageNet, Fall 2011 release”); and
  • “BiT-L”, pre-trained on JFT-300M, a proprietary dataset.

Each family is composed of ResNet-50 (R50x1), a ResNet-50 three times wider (R50x3), a ResNet-101 (R101x1), a ResNet-101 three times wider (R101x3), and the flagship architecture, a ResNet-152 four times wider (R152x4).

For more details see: https://github.com/google-research/big_transfer

3. Intel® OpenVINO™ Deep Learning Toolkit:

Intel® Distribution of OpenVINO™ toolkit is a comprehensive toolkit for developing applications and solutions based on deep learning tasks, such as emulation of human vision, automatic speech recognition, natural language processing, and recommendation systems. It provides high compute performance and rich deployment options, from edge to cloud. Some of its advantages are:

  • Enabling CNN-based deep learning inference on the edge.
  • Supporting various execution modes across Intel® technologies: Intel® CPU, Intel® Integrated Graphics, Intel® discrete Arc and Data Center Flex Series Graphics, Intel® Neural Compute Stick 2, and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
  • Accelerated time-to-market via an easy-to-use library of computer vision functions and pre-optimized kernels.
Figure 1: OpenVINO™ overview. For detailed documentation about OpenVINO™ see: https://docs.openvino.ai/latest/index.html

OpenVINO™ development tools: To download, convert, optimize, and tune pre-trained deep learning models, install OpenVINO™ development tools, which provides the following tools:

See sample applications in OpenVINO™ Toolkit Samples Overview.

Figure 2: OpenVINO™ Workflow Overview

4. BiT (Big Transfer) Model Optimization:

In this section, we provide the details on how to perform BiT model optimization and perform benchmarking to compare TensorFlow and OpenVINO™ inference throughput and latency.

Step 1: Installation and setup:

Before you install and setup, make sure your system meets the minimum requirements.

Step 1.1: Setup a python virtual environment using either Python venv or using Conda. Here we illustrate using Conda. For Conda installation instructions, Click Here.

Step 1.2: Here, we show steps to create and activate conda environment:

conda create -n BiT python=3.9
conda activate BiT

Step 1.3: Install OpenVINO™ and TensorFlow dependencies in the virtual environment. For more details click here.

pip3 install openvino-dev[tensorflow2]
pip3 install tensorflow-hub

Step 2: Download BiT-M pre-trained TensorFlow Hub model(s):

Here we provide instructions on how to download the TensorFlow Hub models.

BiT-M R50x1:

wget https://tfhub.dev/google/bit/m-r50x1/1?tf-hub-format=compressed -O bit_m_r50x1_1.tar.gzmkdir -p bit_m_r50x1_1 && tar -xvf bit_m_r50x1_1.tar.gz -C bit_m_r50x1_1

BiT-M R50x3:

wget https://tfhub.dev/google/bit/m-r50x3/1?tf-hub-format=compressed -O bit_m_r50x3_1.tar.gzmkdir -p bit_m_r50x3_1 && tar -xvf bit_m_r50x3_1.tar.gz -C bit_m_r50x3_1

BiT-M R101x1:

wget https://tfhub.dev/google/bit/m-r101x1/1?tf-hub-format=compressed -O bit_m_r101x1_1.tar.gzmkdir -p bit_m_r101x1_1 && tar -xvf bit_m_r101x1_1.tar.gz -C bit_m_r101x1_1

BiT-M R101x3:

wget https://tfhub.dev/google/bit/m-r101x3/1?tf-hub-format=compressed -O bit_m_r101x3_1.tar.gzmkdir -p bit_m_r101x3_1 && tar -xvf bit_m_r101x3_1.tar.gz -C bit_m_r101x3_1

Note: See Appendix A for utility script download_mo_bit_models.sh to download and optimize BiT models all at once.

Step 3: Convert TensorFlow saved_model to Intel® OpenVINO™ IR (Intermediate representation):

Model Optimizer (mo) is the command line utility for model optimization. It gets installed with the OpenVINO™ pip installation. Model Optimizer imports, converts, and optimizes models that were trained in popular frameworks to a format usable by OpenVINO™ components. For more detailed information about converting models with Model Optimizer please refer to its documentation.

Starting from the 2022.1 release the Model Optimizer can generate an IR with partially defined input shapes (“-1” dimension in the TensorFlow model or dimension with string value in the ONNX model). Some of the OpenVINO™ plugins require model input shapes to be static, so you should call “reshape” method in the Inference Engine and specify static input shapes. For optimal performance, it is still recommended to update input shapes with fixed ones using--input or--input_shape command-line parameters.

Here we are not specifying any--input_shape, so that we can test any shape input.

Command to generate OpenVINO™ IR for bit_m-r50x3_1:

mo --framework tf \
--saved_model_dir ./bit_m_r50x3_1 \
--output_dir ov_irs/bit_m_r50x3_1/

To optimize other BiT models with OpenVINO™ model optimizer replace the--saved_model_dir path.

Step 4: Benchmark TensorFlow and OpenVINO™ inference performance:

We use a script which can be used to evaluate and compare the performance of TensorFlow and OpenVINO™. The following are the usage instructions. Refer to Appendix A for run_ov_tf_perf.py.

Note: For a list of supported device refer to this list in the documentation.

$ python run_ov_tf_perf.py -husage: run_ov_tf_perf.py [-h] -tf TFHUB_URL -ov OV_XML [-d TARGET_DEVICE] [-i INPUT_IMAGE] [-s SHAPE] [-t BENCH_TIME]Script to benchmark BiT model with TensorFlow and OpenVINOrequired arguments:
-tf TFHUB_URL, --tfhub_url TFHUB_URL
TensorFlow HUB BiT model URL
-ov OV_XML, --ov_xml OV_XML
Path to OpenVINO model XML file
optional arguments:
-h, — help show this help message and exit
-d TARGET_DEVICE, --target_device TARGET_DEVICE
Specify a target device to infer on.
-i INPUT_IMAGE, --input_image INPUT_IMAGE
Input Image URL or Path to image.
-s SHAPE, --shape SHAPE
Set shape for input ‘N,W,H,C’. For example: ‘1,128,128,3’
-t BENCH_TIME, --bench_time BENCH_TIME
Benchmark duration in seconds

The following is a usage example for benchmarking R50x3_1 model.
Note: -tf, -ov are required, rest are optional:

python run_ov_tf_perf.py \
-tf https://tfhub.dev/google/bit/m-r50x3/1 \
-ov ov_irs/bit_m_r50x3_1/saved_model.xml \
-d CPU \
-i https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg \
-s '1,128,128,3' \
-t 10

Sample output for above command

Downloading input test image https://upload.wikimedia.org/wikipedia/commons/6/6e/Golde33443.jpg
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 207k 100 207k 0 0 1243k 0 --:--:-- --:--:-- --:--:-- 1243k
Pre-processing input image...==== Benchmarking OpenVINO inference for 10sec on CPU ====
Input shape: (1, 128, 128, 3)
Model: ov_irs/bit_m_r50x3_1/saved_model.xml
Avg Latency: 0.0711 sec, FPS: 14.07
OV Inference top5 label index(s): [5919 3387 342 5207 649]
==== Benchmarking TensorFlow inference for 10sec on CPU ====
Input shape: (1, 128, 128, 3)
Model: https://tfhub.dev/google/bit/m-r50x3/1
Avg Latency: 0.5602 sec, FPS: 1.79
TF Inference top5 label index(s): [5919 3387 342 5207 649]
Both TensorFlow and OpenVINO reported same accuracy.Speedup on CPU with OpenVINO: 7.9x

Step 6: [Optional] Performance measurement with OpenVINO™ benchmark_app:

The benchmark application (benchmark_app) is a command line utility that is installed with the OpenVINO™ pip installation. This allows estimating deep learning inference performance on supported devices for synchronous and asynchronous modes. For more detailed information about the benchmark app, please refer to its documentation.

Running in Latency mode and Throughput mode:

benchmark_app \
-m ov_irs/bit_m_r50x3_1/saved_model.xml \
-d CPU \
-shape [1,128,128,3] \
-hint latency \
-t 20
benchmark_app \
-m ov_irs/bit_m_r50x3_1/saved_model.xml \
-d CPU \
-shape [1,128,128,3] \
-hint throughput \
-t 20

5. Performance Benchmarking — TensorFlow Vs OpenVINO™:

We ran performance benchmarking on various BiT models (R50x1_1, R50x3_1, R101x1_1, R101x3_1) on Intel® 11th Gen Core i7 1185G7E CPU. We evaluated inference performance of these models with TensorFlow and OpenVINO™. Refer to Section 4 above for details on the benchmarking methodology. We observed better latencies while maintaining the same accuracy on all models with OpenVINO™ as shown in Figure 2. With OpenVINO™ framework on CPU with FP32 precision, we observed an average of 6.7x speedup across several BiT models with input shape of 128x128x3 and average of 2.6x with input shape of 384x384x3. See appendix A for hardware and software configuration disclosure for details.

Figure 3: Normalized Latency Performance comparison with TensorFlow and OpenVINO™ for various BiT models with input size 128 x 128 x 3 and 384 x 384 x 3 on 11th Gen Intel® Core™ i7–1185G7E @ 2.80GHz CPU. See backup for workloads and configurations. Results may vary​.

6. Conclusion:

Intel® OpenVINO™ Toolkit provides an easy way to optimize deep learning models improving performance on various hardware ranging from edge to data center. OpenVINO™ employs a write-once, deploy anywhere paradigm enabling the developers to build optimized models ready for deployment quickly. In the above sections, we showcased the model optimization process on BiT model to achieve a significant boost in inference performance on an Intel® Core™ i7–1185G7E. We could further accelerate the model performance by advanced optimization techniques like quantization, pruning, etc., which are also supported by OpenVINO™. Model optimization is just one of the features of OpenVINO™ Toolkit among many others: Neural Network Compression Framework, DL workbench, OpenVINO™ model server, model zoo, etc. We encourage you to try out OpenVINO ™ in your next AI project.

Notices & Disclaimers:

Performance varies by use, configuration and other factors. Learn more on the Performance Index site.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details.
No product or component can be absolutely secure.
Your costs and results may vary.
Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. ​

Appendix A

Hardware Configuration:

Test setup Hardware Configuration

Software Configuration:

See run_ov_tf_perf.py below

Utility script to download_mo_bit_models.sh

--

--