How Negative Sampling work on word2vec?

Edward Ma
4 min readMay 12, 2019
Photo by Tim Bennett on Unsplash

During neural network training, it always adjust all neuron weight so that it learn how to do the prediction correctly. In NLP, we may face more than 100 k (or even 1M) words and it will cause performance (in term of time) concern. How can we reduce the training sample in a better way ?We have hierarchical softmax previously but word2vec introduces negative sampling methodology to resolve this problem.

--

--

Edward Ma

Focus in Natural Language Processing, Data Science Platform Architecture. https://makcedward.github.io/