Experiments in compressing Sentence Embeddings
Sentence embeddings like BERT have been a revelation in the NLP space. By mapping sentences into an embedding space we allow computers to learn representations of the sentences in that space and in turn allow the computers to extract intelligence from those sentences.
My personal favorite for sentence embeddings is the SentenceTransformers framework. This is also available on HuggingFace.
By using SentenceTransformers, we can get the embeddings for each sentence and then perform activities like semantic textual similar, semantic search, or paraphrase mining.
Though sentence embeddings are great they have one big problem. They are huge. They need a lot of space. The better models represent sentences in a 768-dimensional vector space where each dimension uses float precision, the storage requirement per sentence becomes 768*4=3072 bytes. So if you have millions of sentences then the storage requirements become humongous.
There are some approaches which try to reduce the model sizes, prominent among them are Efficient Passage Retrieval with Hashing for Open-domain Question Answering and Model Distillation.
Since I work with Dr. Kumar Eswaran(yes, the same one from the Riemann Hypothesis fame) and know his patented algorithm(based on https://www.researchgate.net/publication/314154178_On_non-iterative_training_of_a_neural_classifier_Part-I_Separation_of_points_by_planes), I wanted to explore building an alternate space using the embeddings and see if that reduces the size of the embeddings.
Model: To do this experiment I used the “sentence-transformers/paraphrase-mpnet-base-v2” model from HuggingFace. This gives us a 768 dimension vector space.
Process:
- Get 768 dimension sentence-transformers/paraphrase-mpnet-base-v2 embedding for each review.
- Build an alternate embedding space using Dr. Kumar’s algorithm.
- The alternate embedding space can be represented in a bit representation. That is, each dimension is represented using 1 bit.
- Use the new embedding for downstream tasks like classification.
Results:
1)Memory for sentence-transformer embedding: Train (25000x768) and Test(25000x768) embed features dtype are float32 : 153.6MB
Accuracy on embed features 84.15% with KNN.
2) Memory sizes for new embedding: Train embedding bitarray(25000x1111) and Test embedding bitarray(25000x1111) dtype — : 8.6MB
Accuracy of OVS 82.27% with KNN.
So we are able to achieve almost 20 times compression(154MB- 8MB) using our new embedding space with a slight reduction in accuracy. The new embedding space can be built in 30mins on a regular laptop(no GPU).
Using Alpes AI Kumar’s algorithm, I was able to successfully reduce the model representation by around 20 times. This can lead to a lot of space savings for big datasets and maybe even lead to storing large datasets on device.
Next steps: Since the new embedding is a bit representation with a little more work and building a better model we can theoretically reduce the size by more than 64 times. The accuracy also can be improved with some effort. We will be working on these as the next steps.