Photo by DeepMind on Unsplash

One-Click Quantization of Deep Learning Models with the Neural Coder Extension

Easy Model Quantization in Visual Studio Code

Intel(R) Neural Compressor
4 min readDec 6, 2022

--

Wenjiao Yue, Kai Yao, Suyue Chen, Haihao Shen, and Huma Abidi, Intel Corporation

Visual Studio Code (VS Code) is a source code editor made by Microsoft with the Electron Framework for Windows, Linux, and macOS. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. VS Code is built with extensibility in mind. It offers many extensions to customize almost every part of VS Code. Neural Coder is one such extension. It is a novel component of Intel Neural Compressor that simplifies deployment of deep learning (DL) models via one-click automated code changes for device compatibility and optimization.

Intel Neural Compressor is an open-source Python library for model compression that reduces model size and improves DL inference performance on CPUs or GPUs. It supports post-training static and dynamic quantization of PyTorch models and automatic accuracy-driven tuning strategies for users to easily generate quantized models. The users can easily apply static, dynamic, and aware-training quantization approaches while giving an expected accuracy criterion.

Neural Coder can perform automated benchmarking of optimization approaches to determine the best out-of-box performance. It uses static program analysis and heuristics to help users take advantage of Intel DL Boost and hardware features to improve performance. This one-click enabling boosts developer productivity while making it easier to take advantage of acceleration. We provide here a detailed step-by-step guide to using the Neural Coder extension in VS Code.

First, if you’re not already using VS Code on a Linux system, open VS Code Extension and link to a remote Linux server via SSH.

If necesary, connect VS Code to a remote Linux server

Second, search for Neural Coder extension in the VS Code extensions marketplace. Just click “Install” when you see the icon below. (Note: The installation location should be the SSH remote server to which you are connected, if you’re using VS Code on a Windows system.)

Search for “Neural Coder” in the VS Code extensions marketplace

Third, click the settings icon and select “Extension Settings” to enter the path to your preferred Python version.

Go to Extension Settings
Enter the Python version that Neural Coder should use

Fourth, open the deep learning code that you want to quantize and evaluate. You should see a new icon in the upper right pane and sidebars on the left showing your operation history.

Neural Coder extension icon
Neural Coder history panel

Fifth, click the Neural Coder button at the top right and select the optimization (quantization) that you want to run on your code. Select “INC Enable INT8 (Static),” “INC Enable INT8 (Dynamic),” or “INC Enable BF16” then wait loading to complete.

Progress bar

You will see that the quantization has been enabled into your code:

Auto-enabled quantization via the VS Code Neural Coder extension (e.g., Hugging Face model)

Your Neural Coder history should be updated and displayed as patch files. You can easily backtrack to see how the quantization enabling was done by Neural Coder.

History of Neural Coder enabling that you have done
Patch file showing a specific Neural Coder enabling run

Finally, select “INC Auto Enable Benchmark” and enter the Python code execution parameters (argparse) for the current code.

Enter your Python code execution parameters

An “Output” panel will appear, displaying the enabling and benchmark results for the current deep learning code:

Result output

The “Auto” section in the history panel keeps the patch file (.diff) for each optimization within this benchmark execution.

Patch files for all optimizations in this benchmark

Click a patch file to see the result.

Contents of the patch file

We encourage you to give the Neural Coder extension a try in VS Code.

--

--