Learn Word2Vec by implementing it in tensorflow
aneesh joshi
2.6K32

Nice tutorial. But really I think that doing a word2vec tutorial without using negative sampling is misleading. Let me explain myself : you’re almost always going to use negative sampling when using word2vec, in real life. It is the essence of why word2vec is used : it makes it scalable. Most people think the difference between plain skip-gram (or CBOW) word2vec and negative sampling SG/CBOW word2vec is only the fact that we sample negative words. When it’s not ! There is a change in the loss function, which is described here : https://arxiv.org/abs/1402.3722 if you’re interested. So the implementation is radically different, and I would argue it is harder to implement, because you’re note using cross-entropy but a pure custom loss. Your tutorial would have gone from good to excellent if you would have explained negative sampling and how to implement it. But props to you anyway :)