[week6-YelpGuesser]

YelpGuesser
bbm406f16
Published in
3 min readJan 20, 2017

Experimental Results

Hi,

Finally we finished our project. At the end of the project we learned a lot of information about machine learning and text mining issues.We want share with you our experimental results and some parts from our final report.

We present the result of our prediction by showing you two graphs, one is the R 2 score graph and the other is MSE graph, for each feature extraction method (Bag of Words and Word2Vec separately). After that, we analyze each of the result and the chosen method of our prediction.

Bag of Words R 2 Score
Bag of Words MSE
Word2Vec R 2 Score
Word2Vec MSE

The result of our project were able to use the Yelp dataset and then predicted the ratings. We have chosen the machine learning algorithm which gave us the best performance for predicting it. We use 10000 features for our algorithms. All prediction models we use regression model. In our
project, we have compared three algorithm, then we choose best one. Beside this, we also have experimented different feature extraction method. From this project, we conclude that the best result of our prediction were given by the combination of Bag of Words feature extraction and Support Vector Regression, and for the Word2Vec feature extraction, the best result were given by applying the Random Forest Regression. Meanwhile, if we try to compare between these two feature extraction model, even though the R 2 score from the Word2Vec model is lower than Bag of Words,
it should be noted that the Bag of Words needs much longer time to run compared to Word2Vec method. So in terms of cost, Word2Vec is not as costly as Bag of Words, but in terms of accuracy, Bag of Words gave better score than Word2Vec in average. As this project has finished, we still believe that there are many interesting topics which is related
to our project that can be implemented in the future. For example, by trying other methods of feature selection (Part-of-Speech tagging, Latent Semantic Indexing), applying our model into other fields, like music, game, applications and see whether the result will be different, incorporating and
evaluating sarcastic reviews into the model, or combining this regression model of rating prediction with user-based recommendation system.
We hope that our project will give new perspective in understanding the text reviews and its role in predicting ratings, and contribute to the development of machine learning area in general.

Thank you !

--

--