[Week 4 — Classification or Regression]

Sentiment Analyzer
bbm406f16
Published in
2 min readDec 25, 2016
Taken from : http://www.slideshare.net/datascienceth/machine-learning-in-image-processing

What do we need when predicting scores of the reviews?
Which one is better for our problem, classification or regression?

We have ten classes for three different evaluation on the restaurant reviews. And there is a relation between these scores. For example, if flavor score is 7, this is more delicious than the class which’s score 5.

Classifiers are good at independent classes. When the data misclassified, there is no point in how wrongly classified we are. When we try to estimate the review which actual class is 5, There is no difference between estimating 2 or 4 of the class. But predicted class 4 is better than 2. Maybe we can handle this with Regression. Because error rate is the difference between hypothesis with actual value. The more mistakes we make, the bigger the error.

We may need to change our approach to reaching the right result.

Dataset

We have completed our own dataset as of today. There are 50.000 comments and evaluation in three categories. Hopefully, this dataset will be used in future projects.

Finally, we are investigating some language processing libraries to get better accuracies. The most important Turkish NLP is the ZEMBEREK. We struggle due to lack of documentation and insufficient resources on the internet. We will blog about Zemberek next week. Stay tuned.

At which stage we are now?
1- Dataset (completed)
2- Natural Language Processing (research phase)
3- Classifications (half-completed) (Note: We are searching related work and approaches.)

--

--