Simple Python Downloader for Available Word Embeddings
In natural language processing, word embeddings are often used for many tasks such as document classification, named-entity recognition, question answering and so on. In these days, there are many available pre-trained word embeddings, so we don’t need to train them by ourselves.
Pre-trained word embedding is easy to use, however, it takes long time to search and download pre-trained word embeddings because they are made by different people and published on different sites. It’s a waste of time.
In order to save your time, I made a simple tool to download available word embeddings. The name is chakin. The features are: written in Python, enabled search and download datasets, supported 23 vectors(May 29, 2017).
Let me show you how to use it.
Installation
To install chakin, simply:
$ pip install chakin
Satisfaction, guaranteed.
Usage
You need only three line to download a dataset. As an example, let’s download fastText(English ver), one of the word embeddings. First, you have to run python interpreter:
$ python
Before downloading the dataset, you have to import chakin and search word embeddings by search method. In this case, we will search datasets by their language:
>>> import chakin
>>> chakin.search(lang="English") Name Dimension Corpus VocabularySize2 fastText(en) 300 Wikipedia 2.5M11 GloVe.6B.50d 50 Wikipedia+Gigaword 5 (6B) 400K12 GloVe.6B.100d 100 Wikipedia+Gigaword 5 (6B) 400K...
Currently, search method supports only target languages.
Once you find the dataset you want to download, you can download it by calling download method with the dataset index:
>>> chakin.download(number=2, save_dir='./')
Test: 100% || | Time: 0:18:32 6.7 MiB/s
'./wiki.en.vec'
Conclusion
Public word embeddings are often used in natural language processing. But it can take long time to train word embeddings by yourself. In this post, I introduced a tool to download pre-trained word embeddings. It is useful for you to save your time.
If you Star this repository, It’s very encouraging for me!