GloVe Embedding for Sentence

Shivika K Bisen
Bright AI
Published in
2 min readNov 12, 2022

GloVe (Global Vectors for Word Representation) is developed by the Stanford NLP group and has given great results in providing meaning to the texts. It is a log-bilinear model with weighted least-squares and is based on the count of co-occurrence of words. For instance beagle and dog

Co-occurrence translates to meaningful analogies

Compared to the raw probabilities, the ratios of co-occurrence probabilities do a much better job to distinguish relevant words like ice and solid

Advantage of GloVe over word2vec and how it is different

The Word2vec model is based on the local window concept, uses Feed forward network and Softmax activation. It has Skip gram model which predicts neighboring words, given a center word, and in the CBOW model it's vice-versa. While GloVe is based on count-based probability on the entire global corpus (co-occurrence matrix). Therefore gives more practical meaning. It is fast in training and scalable for large corpus and works well with a small corpus

Python implementation of GloVe embedding for sentence

Here is the code for using pre-trained GloVe embeddings with 50-dimensions that are trained on word-word co-occurrences in a Wikipedia corpus of 6B tokens. The idea is to update the vector with each word in the sentence. Once all the words are covered, get the mean of the vector to get sentence embedding

Note:

Doc2Vec is a model that can understand and represent the meaning of entire documents. It takes in a bunch of documents and learns to create numerical vectors that capture the essence of each document. Unlike Word2Vec and GloVe, which focus on individual words

--

--

Shivika K Bisen
Bright AI

Gen AI/ML, Data Scientist | University of Michigan Alum | Generative AI, Recommendation & Search & NLP, Predictive models. https://sbisen.github.io/