Word Embeddings

Sandaru Seneviratne
Sep 9, 2018 · 2 min read

Word embeddings is considered as one of the most important and useful areas in Natural Language Processing(NLP) solely due to its many advantages in the field. In simple words, word embeddings is an NLP and feature learning technique which is used to map words from a vocabulary to vectors of real numbers. Through these vectors, it is expected to capture the semantic/ hierarchical meanings of the words according to the context in which the words are used.

At the initial stages of the introduction of the word embeddings, the process included an embedding space with one dimension for each word. But due to the difficulties encountered, the embedding space evolved to a lower dimensional vector space. But, identifying the optimum number of dimensions for a vector and how these dimensions are reduced in order to accurately capture the meanings of words are some of the common questions at present which are however solved using different techniques while preserving the meaning of the words.

2-D Vector Space

Word embeddings are used for different purposes and there exists different algorithms like Word2Vec, Glove, LSA(Latent Semantic Analysis), LDA(Latent Dirichlet Analysis), etc. which serve the expected purpose. But depending on the specific task in which the word embeddings are used, the differences among the existing algorithms play a role in the accuracy of the outputs obtained through these algorithms.

LSA and LDA are proven to be effective in efficient leveraging of statistical information even though they do relatively poorly on the analogy tasks. But Word2Vec is proven to be an effective algorithm for analogy tasks while Glove utilizes matrix factorization method to exploit statistical information and utilizes the benefit of one of the Word2Vec models (skipgram) in the prediction. Therefore, it is obvious that choosing which algorithm to use for a given task depends entirely on the task itself.

References

  1. Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).
  2. Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. “Glove: Global Vectors for Word Representation.” EMNLP. Vol. 14. 2014.
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade