Compressing BERT Sentence Embeddings

Published in

Ozonetel AI

2 min readNov 24, 2022

After our experiments in compressing sentence embeddings, we continued to do more research on how we can do a better job.

Thanks to the KE Sieve algorithm by Dr. Kumar, we made some more improvements and we are glad to report that we have been able to improve our results a lot.

Not only that, we now have a common model which can work for sentences of any domain. Our earlier experiment was fine tuned for a particular dataset(IMDB). But our new approach has built a model for any text.

Following are our results:

IMDB dataset:

Huggingface sentence embedding size: 146MB

Huggingface sentence embedding accuracy(KNN): 78.84

KE Sieve sentence embedding size: 3MB

KE Sieve sentence embedding accuracy(KNN): 77.79

So our approach gets 45 times compression to get similar accuracy.

Reduce your neural search embedding database size by 45 times.

Currently our approach still depends on the 768 dimension sentence embedding. That is, even for inference we first need to get the 768 dimension sentence embedding from huggingface. When we pass this 768 dimension embedding to our system, it converts that to a 548 bit vector embedding.

Though there is one extra step in this pipeline, it is worth it because now your neural search database size has reduced by 45 times.

And because this new embedding is a bit vector, the size of your sentence vectors becomes really small. For example, the IMDB dataset is now stored in just 3MB. Because of the small size you can even ship this data to the browser and do the inference on the client side for some use cases.

Next Steps:

Coming up with our own embedding and removing the dependence on huggingface sentence vectors.
Exposing this as an API service so you can also compress your sentence embeddings and take advantage of the space savings.

Compressing BERT Sentence Embeddings

Written by nutanc