Compressing unsupervised fastText models
How to reduce word embeddings models by 300 times, with almost the same performance on downstream NLP tasks
FastText is a method for encoding words as numeric vectors, developed in 2016 by Facebook. Pretrained fastText embeddings help in solving problems such as text classification or named entity recognition and are much faster and easier to maintain than deep neural networks such as BERT. However, typical fastText models are very huge: for example, the English model by Facebook, when unzipped, occupies 7GB on disk.
In this post, I present the Python package compress-fasttext that can compress this model to 21MB (x300!) with only a slight loss in accuracy. This makes fastText more useful in environments with limited disk or memory.
In the first part of the article, I show how to use the compressed fastText model. In the second part, I explain some math behind fastText and its compression.
How to use it?
Working with an existing model
You can install the package simply with pip install compress-fasttext
. It is based on the Gensim implementation of fastText and has the same interface. A model can be loaded into Python directly from the web:
import compress_fasttext
small_model = compress_fasttext.models.CompressedFastTextKeyedVectors.load(
'https://github.com/avidale/compress-fasttext/releases/download/v0.0.4/cc.en.300.compressed.bin')
You can view this model just as a dictionary that maps any word to its 300-dimensional vector representation (a.k.a. embedding):
print(small_model['hello'])
# [ 1.847366e-01 6.326839e-03 4.439018e-03 ... -2.884310e-02]
# a 300-dimensional numpy array
Words with similar meanings often have similar embeddings. Because embeddings are vectors, their similarity can be evaluated with the cosine measure. For related words (e.g. “cat” and “dog”) cosine similarity is close to 1, and for unrelated words is closer to 0:
def cosine_sim(x, y):
return sum(x * y) / (sum(x**2) * sum(y**2)) ** 0.5print(cosine_sim(small_model['cat'], small_model['cat']))
# 1.0print(cosine_sim(small_model['cat'], small_model['dog']))
# 0.6768642734684225print(cosine_sim(small_model['cat'], small_model['car']))
# 0.18485135055040858
Actually, you can use cosine similarity to find closest neighbours of a word. For example, our compressed fastText models knows that Python is a programming language, and considers it similar to PHP and Java.
print(small_model.most_similar('Python'))
# [('PHP', 0.5253), ('.NET', 0.5027), ('Java', 0.4897), ... ]
In practical applications, you usually feed fastText embeddings to some other model. For example, you can train a classifier on top of fastText to tell edible things from inedible ones:
import numpy as np
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.base import BaseEstimator, TransformerMixinclass FastTextTransformer(BaseEstimator, TransformerMixin):
""" Convert texts into their mean fastText vectors """
def __init__(self, model):
self.model = model def fit(self, X, y=None):
return self def transform(self, X):
return np.stack([
np.mean([self.model[w] for w in text.split()], 0)
for text in X
])
classifier = make_pipeline(
FastTextTransformer(model=small_model),
LogisticRegression()
).fit(
['banana', 'soup', 'burger', 'car', 'tree', 'city'],
[1, 1, 1, 0, 0, 0]
)classifier.predict(['jet', 'cake'])
# array([0, 1])
What models are available
Some models for English and Russian can be downloaded from the release page. For English, it is recommended to use the 25MB model with a good balance of size and accuracy.
If you need some other language, you can look at the collection by Liebl Bernhard that contains compressed fastText models for 101 languages.
Compressing a model
If you want to create your own compressed model, you need to install the library with some extra dependencies:
pip install compress-fasttext[full]
First, you need to load the model that you are going to compress. If the model has been trained with the Facebook package, you load it as follows:
from gensim.models.fasttext import load_facebook_model
big_model = load_facebook_model('path-to-original-model').wv
Otherwise, if the model is in the Gensim format, you load it as
import gensim
big_model = gensim.models.fasttext.FastTextKeyedVectors.load('path-to-original-model')
Now you can compress the model in three lines of code:
import compress_fasttext
small_model = compress_fasttext.prune_ft_freq(big_model, pq=True)
small_model.save('path-to-new-model')
If you wish, you can control the size of the model with extra arguments:
small_model = compress_fasttext.prune_ft_freq(
big_model,
new_vocab_size=20_000, # number of words
new_ngrams_size=100_000, # number of character ngrams
pq=True, # use product quantization
qdim=100, # dimensionality of quantization
)
By using smaller vocabulary sizes and number of dimensions you create a smaller model but decrease its accuracy relative to the original model.
How does it work?
Related work
The original fastText library does support model compression (and even has a paper about it), but only for supervised models trained on a particular classification task. On the other hand, compress-fastText is intended for unsupervised models which provide word vectors that can be used for multiple tasks. My work is partially based on the 2019 post by Andrey Vasnetsov, and partially on the navec library for Russian. It was originally described in my Russian post.
How fastText can be compressed?
FastText is different from other word embeddings methods because it combines embedding for words with embeddings of character n-grams (i.e. sequences of several consecutive characters). The embeddings of words and n-grams are averaged together. If the word is out of vocabulary, it is composed of n-grams only, which makes fastText work smoothly with misspelled words, neologisms, and languages with rich morphology.
The following pseudo-code demonstrates the calculation:
def embed(word, model):
if word in model.vocab:
result = model.vectors_vocab[word]
else:
result = zeros()
n = 1
for ngram in get_ngrams(word, model.min_n, model.max_n):
result += model.vectors_ngrams[hash(ngram)]
n += 1
return result / n
As you can see, all information is taken from two matrices, vectors_vocab
and vectors_ngrams
, and these matrices are what take up space. The size of a large matrix can be reduced in several ways:
- Drop most rows in the matrix, and keep only the most frequent words and n-grams;
- Store the matrix with less precision (
float16
instead offloat32
); - Split the matrix in small chunks, cluster them, and instead of each chunk store the id of the corresponding cluster. This is called product quantization;
- Factorize the matrix into a product of two smaller matrices (not recommended because of low accuracy).
The library compress-fasttext supports all four compression methods, and it is recommended to combine the first three of them. This is what the method prune_ft_freq
actually does.
Are the compressed models good enough?
To confirm that after compression the models still preserve useful information, we can evaluate them with the tool SentEval. This tool includes 17 downstream tasks, such as short text classification, evaluation of semantic similarity between texts, and natural language inference, and 10 diagnostic tasks. For each task, embeddings are used either directly or with a small linear model on top. I adapt the bow.py example to use fastText and execute it with the full 7GB English model and its 25MB compressed version. The table below shows the performance of both models on all tasks:
On average, the small model scores 0.9579 of the full model performance, while being almost 300 times smaller. This confirms that compressing fastText models is not a bad idea if for some reason large model size is undesirable.
Conclusion
FastText is a great method of computing meaningful word embeddings, but the size of a typical fastText model is prohibitive for using it on mobile devices or modest free hostings. The methods collected in compress-fastText allow reducing the model size by hundreds of times without significantly hindering downstream performance. I release a package and compressed models that can be used to solve various NLP problems efficiently.
In the future, I plan to release more tiny models, such as a 45MB BERT model for English and Russian languages. Subscribe and stay tuned!
The post was written by David Dale (https://daviddale.ru/en), a research scientist in NLP and developer of chatbots.
If you refer to this article (or the package) in a scientific paper, please cite it:
@misc{dale_compress_fasttext,
author = "Dale, David",
title = "Compressing unsupervised fastText models",
editor = "towardsdatascience.com",
url = "https://towardsdatascience.com/eb212e9919ca",
month = {December},
year = {2021},
note = {[Online; posted 12-December-2021]},
}