Published in


đź’ˇDynamic Quantization

Quantizing a network means converting it to use a reduced precision integer representation for the weights and activations (usually int8 compared to floating point implementations).

Advantages of Quantization:

  • Reduction in model size.

When converting from floating point to integer values you are essentially multiplying the floating point value by some scale factor and rounding the result to a whole number.

The various quantization approaches differ in the way they approach determining that scale factor.

What makes it dynamic ?

Static quantization quantizes the weights and activations of the model. It fuses activations into preceding layers where possible. It requires calibration with a representative dataset to determine optimal quantization parameters for activations.

Static Quantization (Post Training Quantization) is typically used when both memory bandwidth and compute savings are important. CNNs is a typical use case.

In dynamic quantization the weights are quantized ahead of time but the activations are dynamically quantized during inference (on the fly). Hence, dynamic.

As mentioned above dynamic quantization have the run-time overhead of quantizing activations on the fly. So, this is beneficial for situations where the model execution time is dominated by memory bandwidth than compute (where the overhead will be added). This is true for LSTM and Transformer type models with small batch size.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store