Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

Neural Magic

Published in

Deep Sparse

2 min readOct 8, 2021

Get Started: Sparsify Hugging Face BERT Using Your Data

You can replicate the performance and compression results mentioned in the video with your own data using Neural Magic’s open source and freely available tools.

Visit our BERT Getting Started page in the SparseZoo and:

Ensure that you have a correct setup and that performance results are compelling by running a quick benchmarking exercise. You can find the code here.
In order to run with your own data, follow a recipe that will help you encode the transferable hyperparameters necessary for creating sparse models. You will be creating a “teacher” model pre-trained on your dataset, and ultimately distilling the knowledge down to the pruned “student’ BERT model on the same dataset.
Export the “student” model to deploy on your CPU hardware using ONNX-compatible inference engines such as DeepSparse. To achieve the performance mentioned in the video, we encourage you to use the freely available DeepSparse Engine which is explicitly coded to accelerate the performance of sparsified models.

No matter where you are in the process, if you run into any issues, we are here to help. Join our Slack or Discourse forums to get direct access to our engineering and support teams.

Originally published at https://neuralmagic.com on October 8, 2021.

Sparsify Hugging Face BERT for Better CPU Performance & Smaller File Size

Get Started: Sparsify Hugging Face BERT Using Your Data

Written by Neural Magic