Simple Python Downloader for Available Word Embeddings

Published in

chakki

2 min readMay 30, 2017

In natural language processing, word embeddings are often used for many tasks such as document classification, named-entity recognition, question answering and so on. In these days, there are many available pre-trained word embeddings, so we don’t need to train them by ourselves.

Pre-trained word embedding is easy to use, however, it takes long time to search and download pre-trained word embeddings because they are made by different people and published on different sites. It’s a waste of time.

In order to save your time, I made a simple tool to download available word embeddings. The name is chakin. The features are: written in Python, enabled search and download datasets, supported 23 vectors(May 29, 2017).

chakki-works/chakin

chakin - Simple downloader for pre-trained word vectors

github.com

Let me show you how to use it.

Installation

To install chakin, simply:

$ pip install chakin

Satisfaction, guaranteed.

Usage

You need only three line to download a dataset. As an example, let’s download fastText(English ver), one of the word embeddings. First, you have to run python interpreter:

$ python

Before downloading the dataset, you have to import chakin and search word embeddings by search method. In this case, we will search datasets by their language:

>>> import chakin
>>> chakin.search(lang="English")        Name   Dimension                     Corpus  VocabularySize2 fastText(en)       300                  Wikipedia            2.5M11 GloVe.6B.50d       50  Wikipedia+Gigaword 5 (6B)            400K12 GloVe.6B.100d     100  Wikipedia+Gigaword 5 (6B)            400K...

Currently, search method supports only target languages.

Once you find the dataset you want to download, you can download it by calling download method with the dataset index:

>>> chakin.download(number=2, save_dir='./')
Test: 100% ||               | Time: 0:18:32  6.7 MiB/s
'./wiki.en.vec'

Conclusion

Public word embeddings are often used in natural language processing. But it can take long time to train word embeddings by yourself. In this post, I introduced a tool to download pre-trained word embeddings. It is useful for you to save your time.

If you Star this repository, It’s very encouraging for me!

chakki-works/chakin

chakin - Simple downloader for pre-trained word vectors

github.com

Simple Python Downloader for Available Word Embeddings

chakki-works/chakin

chakin - Simple downloader for pre-trained word vectors

Installation

Usage

Conclusion

chakki-works/chakin

chakin - Simple downloader for pre-trained word vectors

Written by Hiroki Nakayama