One-Click Quantization of Deep Learning Models with the Neural Coder Extension

Easy Model Quantization in Visual Studio Code

Intel(R) Neural Compressor

Published in

Intel Analytics Software

4 min readDec 6, 2022

Wenjiao Yue, Kai Yao, Suyue Chen, Haihao Shen, and Huma Abidi, Intel Corporation

Visual Studio Code (VS Code) is a source code editor made by Microsoft with the Electron Framework for Windows, Linux, and macOS. Features include support for debugging, syntax highlighting, intelligent code completion, snippets, code refactoring, and embedded Git. VS Code is built with extensibility in mind. It offers many extensions to customize almost every part of VS Code. Neural Coder is one such extension. It is a novel component of Intel Neural Compressor that simplifies deployment of deep learning (DL) models via one-click automated code changes for device compatibility and optimization.

Intel Neural Compressor is an open-source Python library for model compression that reduces model size and improves DL inference performance on CPUs or GPUs. It supports post-training static and dynamic quantization of PyTorch models and automatic accuracy-driven tuning strategies for users to easily generate quantized models. The users can easily apply static, dynamic, and aware-training quantization approaches while giving an expected accuracy criterion.

Neural Coder can perform automated benchmarking of optimization approaches to determine the best out-of-box performance. It uses static program analysis and heuristics to help users take advantage of Intel DL Boost and hardware features to improve performance. This one-click enabling boosts developer productivity while making it easier to take advantage of acceleration. We provide here a detailed step-by-step guide to using the Neural Coder extension in VS Code.

First, if you’re not already using VS Code on a Linux system, open VS Code Extension and link to a remote Linux server via SSH.

If necesary, connect VS Code to a remote Linux server

Second, search for Neural Coder extension in the VS Code extensions marketplace. Just click “Install” when you see the icon below. (Note: The installation location should be the SSH remote server to which you are connected, if you’re using VS Code on a Windows system.)

Search for “Neural Coder” in the VS Code extensions marketplace

Third, click the settings icon and select “Extension Settings” to enter the path to your preferred Python version.

Enter the Python version that Neural Coder should use

Fourth, open the deep learning code that you want to quantize and evaluate. You should see a new icon in the upper right pane and sidebars on the left showing your operation history.

Fifth, click the Neural Coder button at the top right and select the optimization (quantization) that you want to run on your code. Select “INC Enable INT8 (Static),” “INC Enable INT8 (Dynamic),” or “INC Enable BF16” then wait loading to complete.

You will see that the quantization has been enabled into your code:

Auto-enabled quantization via the VS Code Neural Coder extension (e.g., Hugging Face model)

Your Neural Coder history should be updated and displayed as patch files. You can easily backtrack to see how the quantization enabling was done by Neural Coder.