Yufeng LiinMicrosoft AzureFaster and smaller quantized NLP with Hugging Face and ONNX RuntimePopular Hugging Face Transformer models (BERT, GPT-2, etc) can be shrunk and accelerated with ONNX Runtime quantization without retraining.Aug 31, 20202Aug 31, 20202