[Week5-YelpGuesser]

YelpGuesser
bbm406f16
Published in
2 min readJan 6, 2017

For Each Word, Find Corresponding Vector

Hello everyone,

This week we are talking about Wor2vec.Word2Vec allows us to calculate the distance between words in a vector. We can find words close to each other.Trained model for the each word get corresponding vector to us.

Our project we will look at the review and rating. As you know reviews are sentences.But,Word2vec finds the distance of words.We have to find the vector of sentences.Actually, a pretty challenging problem that in Word2vec.

Taken from:http://dsg.rushter.com/reader/tag/word2vec

How can we use Word2Vec on our project? We use Gensim Python package for word2vec.Firstly we download GoogleNews.Also, you can download pre-trained word vectors from (get the file ‘GoogleNews-vectors-negative300.bin’).But It was so big for our computer.That’s why we use text8 Corpus.It trains a small word vector model.We got this result after load text8 Corpus:
2017–01–06 20:09:15,639 : INFO : training on 85026035 raw words (62533684 effective words) took 324.3s, 192806 effective words/s.

After load text8 corpus:
logging.basicConfig(format=’%(asctime)s : %(levelname)s : %(message)s’, level=logging.INFO)
sentences = word2vec.Text8Corpus(‘text8’)

model = word2vec.Word2Vec(sentences, size=200)

Now , we prepared our corpus with Word2vec and we trained it.

Next step ,we calculate the vector of sentences , averaging the vector their words.We do it for training and test set.This average vector will represent our sentence vector(train or test review). Thanks to that way, if you want you can find the similarity of two sentences.
Now we only apply our sentence vectors to our algorithms and get the scores!
Thank you for reading!

--

--