Word Embeddings by example

Dmitry Yemelyanov
Riga Data Science Club
2 min readAug 27, 2020

This post is a gentle introduction to the word embeddings, key concept of Natural Language Processing.

Neural networks are designed to operate on real numbers, so in order to run them on a textual input some preprocessing is required. The common approach used is to convert words into numeric vectors, that would embed word meaning, hence the name of the term: Embeddings.

Let’s learn this by a simple example:
MAN -> [3,4]
WOMAN -> [4,6]
KING -> [8,4]
QUEEN -> [9,6]

The numbers seem random at first, but at a closer look you may notice the same difference between MAN-KING and WOMAN-QUEEN pairs. Similar principle applies to MAN-WOMAN and KING-QUEEN. This demonstrates that vectors represent the meaning by relationship between words: in this case gender and royalty.

Word meaning represented by relationship between words

In practice, word embeddings are usually of a much higher dimensionality. For example, GloVe project is offering word vectors of variable size up to 300 dimensions!

Luckily you do not have to navigate 300d space to make use of them — neural network will do that for you.

Happy Machine Learning!

--

--

Dmitry Yemelyanov
Riga Data Science Club

Founder at Riga Data Science Club | Machine Learning Consultant