Train a DL model for synthetic data generation for model optimization with OpenVINO — Part 2

Published in

OpenVINO-toolkit

7 min readSep 7, 2022

Nowadays, deep learning models tend to get bigger and bigger. Neural Networks with billions of parameters achieve state-of-the-art performances in several tasks, at the cost of heavy workload running on large processing units. The goal of model optimization is to boost the performance of deep learning models, while reducing resource demands.

This blog is the second of a two-blog series (Blog #1) regarding the Google Summer of Code 2022 project “Train a DL model for synthetic data generation for model optimization”, which is developed under the auspices of Intel’s OpenVINO Toolkit. The project consists of two parts. For the first part, which was covered in a previous blog, the goal was to train a lightweight Deep Learning model to generate synthetic images. For the second part, and the topic of this blog, the pre-trained model of the first part is used to generate a dataset of synthetic images for CIFAR-10. Subsequently, this dataset is used for model optimization with OpenVINO’s Post-training Optimization Tool. We evaluate the performance of the 8-bit post-training quantization method on a range of Computer Vision models.

Google Summer of Code page: Program Project | Google Summer of Code
GitHub repository: ThanosM97/gsoc2022-openvino: Development of a DL model for synthetic data generation for model optimization using OpenVINO’s Post-training Optimization Toolkit. (github.com)

Introduction

Post-training Model Optimization

Post-training model optimization is the process of improving the performance of a deep learning model by utilizing techniques on various levels, such as hardware, software, and network architecture. On hardware level, models are optimized to run on specific hardware accelerators (e.g., TPUs), while on software level to be executed on plugin architectures with accelerated performance (e.g., OpenVINO Runtime). Furthermore, techniques such as network pruning reduce the size of a network, thus increasing the inference speed. Currently, another popular technique is model quantization.

Model Quantization

This technique refers to the process of converting the floating-point weights of a network to lower precision compact representations. The aforementioned decrease of the precision leads to a contraction of the model size, and thus an increase of the inference speed, with little degradation in terms of accuracy. Along with the model’s weights, the operations can also be converted to match the corresponding precision. Using this technique, deep learning models can be converted from 16-bit or 32-bit floating-point precisions to
8-bit integer-precision, and potentially increase the performance of the model, in terms of inference speed, even by a factor of 4.

OpenVINO Toolkit

The OpenVINO (Open Visual Inference and Neural network Optimization) Toolkit is an open-source tool, originally developed by Intel, that enables optimization of deep learning models and deployment on Intel hardware using a developed inference engine.

Model Optimizer

The purpose of OpenVINO’s model optimizer is to convert deep learning models developed on several frameworks (e.g. TensorFlow, PyTorch, Caffe, etc.) to an Intermediate Representation (IR) of the model which enables inference on OpenVINO’s runtime. The produced IR model is optimized for the selected target device, improving the inference speed and at the same time keeping the model’s accuracy constant.

The IR model can be further optimized using the OpenVINO’s Post-training Optimization Tool. For more information about the model optimizer, refer to the official documentation.

Post-training Optimization Tool

OpenVINO’s Post-training Optimization Tool provides two quantization methods to optimize a model’s performance. Since it is performed post-training, it does not require a dataset, but rather a representative calibration set (e.g., 300 samples). Furthermore, the model has to be first converted to OpenVINO’s IR format. Granted these requirements, a floating-precision model (FP32 or FP16) can be quantized to 8-bit integer-precision using either the Default Quantization algorithm, or the Accuracy-aware quantization algorithm. The former is recommended as a first step, since it leads to satisfactory performance for the majority of the cases and it only requires an unannotated calibration dataset, while the latter focuses on keeping the accuracy at a specific range, thus requiring an annotated calibration dataset. Figure 2 showcases the optimization workflow.

Experiments

The DiStyleGAN model, developed at the first part of this project, generates images from the distribution of the CIFAR-10 dataset. Therefore, we experimented with PyTorch models, pre-trained on CIFAR-10 for classification. In our experiments, we used both the Default Quantization and the Accuracy-control quantization methods provided by the OpenVINO Post-training Optimization Tool. The same experiments were conducted using multiple calibration datasets, and we compared the results of the quantized models for the classification task on the official CIFAR-10 test set.

PyTorch Models

We opted to quantize PyTorch models, pre-trained on CIFAR-10 for classification. These models were obtained from the chenyaofo/pytorch-cifar-models repository on GitHub. In particular, we used the following models:

ResNet20 (resnet20)
VGG16 (vgg16_bn)
MobileNetV2 (mobilenetv2_x1_4)
ShuffleNetV2 (shufflenetv2_x2_0)
RepVGG (repvgg_a2)

Calibration Datasets

For the quantization process, the OpenVINO POT requires a calibration dataset. The goal of the project is to use synthetic images generated by DiStyleGAN as the calibration dataset. However, we also conducted the quantization using three other datasets. In particular, we use the following calibration datasets in our experiments:

Official CIFAR-10 training set
Synthetic images generated by StyleGAN2-ADA
Synthetic images generated by DiStyleGAN
Fractal images generated by using the Datumaro’s repository on GitHub
(It is important to note that while the synthetic datasets above approximate the CIFAR-10 distribution, thus could be considered representative, the fractal images do not constitute a representative dataset for the deep learning models pre-trained on CIFAR-10.)

We used 5,000 images from each of the aforementioned datasets, 500 images per class of the CIFAR-10 dataset. These subsets can be downloaded from here, or you can generate them following the instructions in the links. We then evaluated the results of the quantization methods on the classification task, for the selected PyTorch models, using the CIFAR-10 test set.

Results

The results for the Default and Accuracy-aware quantization algorithms presented below are obtained using the defaultQuantization.ipynb and accuracyQuantization.ipynb notebooks, respectively. The experiments were conducted on an Intel® Core™ i7–1165G7 CPU. The quantized models can be found here.

Default Quantization
The results of the default quantization method are presented on Table 1. The accuracy is calculated on the official CIFAR-10 test set.

Table 1: Default Quantization results

Accuracy-control Quantization
The results of the accuracy-control quantization method are presented on Table 2. We experimented with the two models, namely ShuffleNetV2 and RepVGG, that showcased accuracy degradation when we used the Default Quantization method. The accuracy is again calculated on the official
CIFAR-10 test set. The inference speeds (FPS) are not reported, since they are similar to the ones presented on Table 1 for the corresponding models.

Table 2: Accuracy-control Quantization results (measure: accuracy)

Discussion

Based on the results obtained by the experiments, it is evident that in most cases the Default Quantization algorithm is able to boost the performance of a deep learning model inference-wise, while showing only a negligible degradation in the model’s accuracy. This especially holds true for the ResNet20, VGG, and MobileNetV2 models that even with a non-representative calibration dataset (i.e. Fractal), the accuracy decrease is limited to 1.1% , while the inference speed can be increased up to a factor of 4 with respect to the IR model, and up to a factor of 14 with respect to the PyTorch models.

Figure 3: Boost of inference speed on IR and Quantized models

In the case of the RepVGG model, the Default Quantization algorithm leads to severe accuracy degradation. We can see that even when the official CIFAR-10 dataset is used for the quantization, the accuracy drops by approximately 42%. As for ShuffleNetV2, although it does not suffer as much when the official CIFAR-10 or the DiStyleGAN calibration datasets are used, there is a 7.48% accuracy degradation in the case of the StyleGAN2-ADA dataset. However, when the Accuracy-aware Quantization algorithm is used, the performance of the two models is substantially increased for the three representative datasets (CIFAR-10, StyleGAN2-ADA, and DiStyleGAN).

Finally, it is also important to notice that the two synthetic datasets, StyleGAN2-ADA and DiStyleGAN, lead to comparable results with that of the official CIFAR-10 dataset in the case of quantization. Surprisingly enough, although the DiStyleGAN model was trained through knowledge-distillation with the StyleGAN2-ADA as a teacher network, there are cases where the quantization process using the dataset generated by the former leads to better results compared to when the latter is used.

Conclusion

It is clear that the performance boost, in terms of inference speed, obtained by the quantization process, is substantial enough to outweigh a possible small accuracy degradation. Furthermore, the results obtained by the two synthetic datasets, or even the Fractal dataset in some models, are comparable to those of the official CIFAR-10 dataset. This means that data generated by a Deep Learning model can be used to optimize models using the OpenVINO Toolkit, thus removing the restrictions of using an official dataset for the calibration.