Simple Python Downloader for Available Word Embeddings

Hiroki Nakayama
Published in
2 min readMay 30, 2017

In natural language processing, word embeddings are often used for many tasks such as document classification, named-entity recognition, question answering and so on. In these days, there are many available pre-trained word embeddings, so we don’t need to train them by ourselves.

Pre-trained word embedding is easy to use, however, it takes long time to search and download pre-trained word embeddings because they are made by different people and published on different sites. It’s a waste of time.

In order to save your time, I made a simple tool to download available word embeddings. The name is chakin. The features are: written in Python, enabled search and download datasets, supported 23 vectors(May 29, 2017).

Let me show you how to use it.


To install chakin, simply:

$ pip install chakin

Satisfaction, guaranteed.


You need only three line to download a dataset. As an example, let’s download fastText(English ver), one of the word embeddings. First, you have to run python interpreter:

$ python

Before downloading the dataset, you have to import chakin and search word embeddings by search method. In this case, we will search datasets by their language:

>>> import chakin
Name Dimension Corpus VocabularySize2 fastText(en) 300 Wikipedia 2.5M11 GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K12 GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K...

Currently, search method supports only target languages.

Once you find the dataset you want to download, you can download it by calling download method with the dataset index:

>>>, save_dir='./')
Test: 100% || | Time: 0:18:32 6.7 MiB/s


Public word embeddings are often used in natural language processing. But it can take long time to train word embeddings by yourself. In this post, I introduced a tool to download pre-trained word embeddings. It is useful for you to save your time.

If you Star this repository, It’s very encouraging for me!



Hiroki Nakayama
Editor for

Open source developer. Interested in machine learning and natural language processing. GitHub: