Word Embeddings Versus Bag-of-Words: The Curious Case of Recommender Systems

Josh Barua
The Startup
Published in
7 min readAug 4, 2020

--

Are word embeddings always the best choice?

If you can challenge a well-accepted view in data science with data, that’s pretty cool, right? After all, “in data we trust”, or so we profess! Word embeddings have caused a revolution in the world of natural language processing, as a result of which we are much closer to understanding the meaning and context of text and transcribed speech today. It is a world apart from the good old bag-of-words (BoW) models, which rely on frequencies of words under the unrealistic assumption that each word occurs independently of all others. The results have been nothing short of spectacular with word embeddings, which create a vector for every word. One of the oft used success stories of word embeddings involves subtracting the man vector from the king vector and adding the woman vector, which returns the queen vector:

Very smart indeed! However, I raise the question whether word embeddings should always be preferred to bag-of-words. In building a review-based recommender system, it dawned on me that while word embeddings are incredible, they may not be the most suitable technique for my purpose. As crazy as it may sound, I got better results with the BoW approach. In this article, I show that the uber-smart feature of word…

--

--

Josh Barua
The Startup

Sophomore, UC Berkeley. Interested in applications of NLP, ML & Linguistics. Writer for The Startup, DDI, Towards Data Science & Towards AI.