The Architecture of Word2Vec

Vishwas Bhanawat
5 min readJun 28, 2019

Word2Vec, as defined by TensorFlow, is a model is used for learning vector representations of words, called “word embeddings” created by Mikolov et al.

There are three main building blocks which make up Word2Vec. We will examine each of them in detail:
Vocabulary Builder
Context builder
Neural network with two layers

Python has a very good library for creating word2vec, that is gensim.

Vocabulary Builder

This module is the basic building block of Word2Vec model. This takes the raw data in the form of sentences and extracts unique words from them to build the vocabulary of the system.

Context builder

This is the next level to convert words to vector. The output of the Vocabulary builder, i.e. the vocabulary object which contains the index and count, as input. This not only takes the particular word but all the words in the range of context window. Context window is the no of words which has to be considered while taking in the input. Generally, five to ten words are considered as standard.

If 5 is used, then 5 words from center word towards left and right are taken into consideration. This forms the word pairings, center and the words from the context window, which are fed to the neural network. This increases the context of the center word with the pairs, thus help in defining the relevant meaning of the word. So this word pair is the output of the context builder and it will pass to the next component, which is a two-layer neural network.

Neural network with two layers

Word2vec uses the neural network for training. There are the following layers :
There is one input layer which has as many neurons as there are words in the vocabulary for training.
The second layer is the hidden layer, layer size in terms of neurons is the dimensionality of the resulting word vectors.
The third and final layer is the output layer which has the same number of neurons as the input layer.

Now, let us understand one-hot encoding. This is the most basic form of converting words into vectors. The size of this vector is the number of words in the vocabulary and each word is assigned a unique vector. The position of the word that represents itself is encoded as 1 and all others positions are encoded as 0. You can learn more about them from here and here.

The input layer takes in the center word’s one-hot encoding. The weights are initialized randomly to train our neural network. The neural network here uses the backtracking algorithm to update its weights. One can get a full insight into the mathematics of this algorithm here.

We feed the center word, which then generates a particular output. That output is being processed by softmax classifier. The softmax classifier is used because it has the ability to convert the output into the probability. This vector tells the probable words from the vocabulary that might be a pair of the input word.

The weights between hidden and input layer and hidden and output layer are updated such that whenever a relevant word’s one-hot encoding is fed into the input, the output should be the word with which it has pairing. The distance between the present output and expected input is calculated and the weights are updated accordingly.

The weights between the hidden layer and input layer are the required vectors which we intend to find.

The significance of these words is that we can find an association and context of each and every word. Words which are used in the same context lie closer or have similar vectors.

Applications

Some real-time applications are -
Dependency parser uses word2vec to generate better and accurate dependency relationship between words at the time of parsing.
Name entity recognition can also use word2vec, as word2vec is very good at finding out the similarity in named entity recognition (NER). All similar entities can come together and you will have better results.
Sentiment analysis uses it to preserve semantic similarity in order to generate better sentiment results. Semantic similarity helps us to know which kind of phrases or words people use to express their opinions, and you can generate good insights and accuracy by using word2vec concepts in sentiment analysis.
We can also build an application that predicts a person’s name by using their writing style.
If you want to do document classification with high accuracy and using simple
statistics, then word2vec is for you. You can use the concept and categorize the documents without any human labels.
Word clustering is the fundamental product of word2vec. All words carrying a similar meaning are clustered together.
Google uses word2vec and deep learning to improve its machine translation
product.

This article is based on the text provided in book Python Natural Language
Processing by Jalaj Thanaki.

--

--