‘Word vectors’ gave us a hint that ChatGPT was coming . . in 2013
When we heard about the Word2Vec algorithm, we should have realized that NLP was entering a new era. And that human-like AI was close.
Word2Vec
Word vectors certainly seemed incredible at the time back in 2013 and did create a minor stir in the AI community:
An unsupervised algorithm that could learn to represent the meaning of words from unlabeled, unstructured text.
Wha?
But it was true.
Word2Vec is an example of a word embedding algorithm.
It’s a neural network that, after unsupervised training on unstructured text – emails, internet pages, news and books – could automatically represent every word as a vector – a point in a, say, 100 dimension space, and it carried the meaning of the word in a mathematically demonstrable manner.
Word2vec works by training a shallow neural network on a large corpus of text. The network is designed to predict a target word based on its surrounding context words.
The main idea is that words that appear in similar contexts tend to have similar meanings. The resulting word vectors capture the semantic relationships between words, and these relationships can be explored using vector operations.