Playing word2vec with pre-trained model
Published in
1 min readApr 11, 2019
Introduction
There are many pre-trained models provided by facebook, google or research institutes (e.g., Academic SINICA in Taiwan). Learners can download these pre-trained models, playing with them, even doing research following Garg el al., (2017).
Downloading pre-trained models
English models
- Facebook FastText https://fasttext.cc/docs/en/crawl-vectors.html
- Google News https://code.google.com/archive/p/word2vec/
- NYTimes model provided by Garg et al.,(2o17) http://stanford.edu/~nkgarg/NYTembeddings/
- https://nlp.stanford.edu/projects/histwords/
Chinese models
- CKIP http://ckip.iis.sinica.edu.tw:8080/license/ total 1.2GB
- https://github.com/Embedding/Chinese-Word-Vectors
- https://github.com/facebookresearch/fastText/blob/master/docs/crawl-vectors.md
Reference
- Garg, N., Schiebinger, L., Jurafsky, D., & Zou, J. (2017). Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes, 115(16). https://doi.org/10.1073/pnas.1720347115
- Xu, H., Xiao, D., Wu, L., & Wang, C.-J. (2018). The Hidden Shape of Stories Reveals Positivity Bias and Gender Bias. Retrieved from https://arxiv.org/abs/1811.04599