Ozonetel AI
Published in

Ozonetel AI

Compressing BERT Sentence Embeddings

After our experiments in compressing sentence embeddings, we continued to do more research on how we can do a better job.

Thanks to the KE Sieve algorithm by Dr. Kumar, we made some more improvements and we are glad to report that we have been able to improve our results a lot.

Not only that, we now have a common model which can work for sentences of any domain. Our earlier experiment was fine tuned for a particular dataset(IMDB). But our new approach has built a model for any text.

Following are our results:

IMDB dataset:

Huggingface sentence embedding size: 146MB

Huggingface sentence embedding accuracy(KNN): 78.84

KE Sieve sentence embedding size: 3MB

KE Sieve sentence embedding accuracy(KNN): 77.79

So our approach gets 45 times compression to get similar accuracy.

Currently our approach still depends on the 768 dimension sentence embedding. That is, even for inference we first need to get the 768 dimension sentence embedding from huggingface. When we pass this 768 dimension embedding to our system, it converts that to a 548 bit vector embedding.

Though there is one extra step in this pipeline, it is worth it because now your neural search database size has reduced by 45 times.

And because this new embedding is a bit vector, the size of your sentence vectors becomes really small. For example, the IMDB dataset is now stored in just 3MB. Because of the small size you can even ship this data to the browser and do the inference on the client side for some use cases.

Next Steps:

  1. Coming up with our own embedding and removing the dependence on huggingface sentence vectors.
  2. Exposing this as an API service so you can also compress your sentence embeddings and take advantage of the space savings.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store