One-Click Quantization of Deep Learning Models with the Neural Coder Extension
Easy Model Quantization in Visual Studio Code
Wenjiao Yue, Kai Yao, Suyue Chen, Haihao Shen, and Huma Abidi, Intel Corporation
Visual Studio Code (VS Code) is a source code editor made by Microsoft with the Electron Framework for Windows, Linux, and macOS. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. VS Code is built with extensibility in mind. It offers many extensions to customize almost every part of VS Code. Neural Coder is one such extension. It is a novel component of Intel Neural Compressor that simplifies deployment of deep learning (DL) models via one-click automated code changes for device compatibility and optimization.
Intel Neural Compressor is an open-source Python library for model compression that reduces model size and improves DL inference performance on CPUs or GPUs. It supports post-training static and dynamic quantization of PyTorch models and automatic accuracy-driven tuning strategies for users to easily generate quantized models. The users can easily apply static, dynamic, and aware-training quantization approaches while giving an expected accuracy criterion.
Neural Coder can perform automated benchmarking of optimization approaches to determine the best out-of-box performance. It uses static program analysis and heuristics to help users take advantage of Intel DL Boost and hardware features to improve performance. This one-click enabling boosts developer productivity while making it easier to take advantage of acceleration. We provide here a detailed step-by-step guide to using the Neural Coder extension in VS Code.
First, if you’re not already using VS Code on a Linux system, open VS Code Extension and link to a remote Linux server via SSH.
Second, search for Neural Coder extension in the VS Code extensions marketplace. Just click “Install” when you see the icon below. (Note: The installation location should be the SSH remote server to which you are connected, if you’re using VS Code on a Windows system.)
Third, click the settings icon and select “Extension Settings” to enter the path to your preferred Python version.
Fourth, open the deep learning code that you want to quantize and evaluate. You should see a new icon in the upper right pane and sidebars on the left showing your operation history.
Fifth, click the Neural Coder button at the top right and select the optimization (quantization) that you want to run on your code. Select “INC Enable INT8 (Static),” “INC Enable INT8 (Dynamic),” or “INC Enable BF16” then wait loading to complete.
You will see that the quantization has been enabled into your code:
Your Neural Coder history should be updated and displayed as patch files. You can easily backtrack to see how the quantization enabling was done by Neural Coder.
Finally, select “INC Auto Enable Benchmark” and enter the Python code execution parameters (argparse) for the current code.
An “Output” panel will appear, displaying the enabling and benchmark results for the current deep learning code:
The “Auto” section in the history panel keeps the patch file (.diff) for each optimization within this benchmark execution.
Click a patch file to see the result.
We encourage you to give the Neural Coder extension a try in VS Code.