[Week 1 — YelpGuesser ]

YelpGuesser
bbm406f16
Published in
4 min readNov 20, 2016

A Look on Sentiment Analysis

Sentiment analysis is one of the most observed topic in Machine Learning and Natural Language Processing. But, have you ever wondered why it is important? Why do people constantly analyze this kind of problem over and over? What is the use of predicting which class the texts are belong to? Well, maybe you may answer, “Of course it is important since by using sentiment analysis, we can automatically differentiate spam and ham.”. But, really, after we finished classifying them into the right class or right prediction rating, then, what?

Wait… I see YelpGuesser on the title. What is it anyway?

Keep guessing ‘till the end of this post, as we from the Yelp Guesser will tell you about sentiment analysis from our perspective.

Before we are going to answer your question, let’s take a look from the definition of sentiment analysis first. According to Oxford Dictionary, sentiment analysis is

the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer’s attitude towards a particular topic, product, etc. is positive, negative, or neutral.

As you can see from that definition,we have broke down several things that we think will distinguish the sentiment analysis from analysis which we done in our everyday life.

First, sentiment analysis is being done computationally, which means that it will include mathematical calculation, not just a mere subjective tendency over the texts. It also implies that we may use algorithm in order to do the calculation, so that the process can be regenerated by different people as long as they knew the method.

Second, identifying and categorizing opinion. This is basically what we are going to do in sentiment analysis. In order to being able to categorize what class will the observed texts fall to, we have to be able to identify it first. And back to the previous rule; computationally, the process to identify and categorize should be able to be expressed through algorithm(s) and calculation.

Third, expressed in a piece of texts. It is clear that the input for sentiment analysis is text. No picture, no graph, no numbers (directly). It is text, just text. It is worth to note that we, human, tend to express our feelings and describe something in text with words. We are using natural language to communicate with each other. But well, since we have to come back to the first definition, again; computationally, then how do we represent the words as numbers? This is what makes it interesting. There are several ways to represent it, for example; Bags of Words.

The last is the phrase which stated the goal of sentiment analysis itself, it is to determine the writer’s attitude. We are trying to guess; is it neutral, is it positive, or is it negative? We knew that by induction it is impossible to have an algorithm that can guess all of them correctly, but we may come up with the best probably approximately correct algorithm to determine it.

Okay, so let’s guess that we have been able to determine it approximately correct, then what is the future usage of sentiment analysis?

Of course it is everywhere.

From the perspective of company, by using sentiment analysis, we can predict how do the consumer (and potential consumer) will react to the product and therefore we may identify their needs, their satisfaction. A good prediction of sentiment analysis is a valuable input to the improvement of the product, campaign, and also affect the decision of what marketing strategy will be taken by the business itself.

From the perspective of the development of science, it is important because we will be able to interpret tons of different meaning, emotion, and use it further in fields like psychology, social sciences and cognitive sciences. Predicting future trends is also possible by using this sentiment analysis. And if we also consider how do the research in machine learning which use non-textual inputs are evolved, these can be combined together with sentiment analysis.

They are interesting, aren’t they?

Okay, so by this point, I guess you will be able to guess what is the meaning behind YelpGuesser (sorry for the guess-ception).

YelpGuesser consists of two words; Yelp and Guesser. Yelp is a website which publish reviews about local businesses ranging from food, restaurants, shopping, home services, etc. The reviews and rating are gathered by crowd-sourcing from the Yelp users. These data will help other Yelp users to evaluate and make a choice. And ‘Guesser’, well, it is a guesser (it is pretty obvious… isn’t it?). YelpGuesser is one of the projects in Machine Learning class at Hacettepe University, and in this project, we are going to use the Yelp Dataset, perform the sentiment analysis over the text review’s of restaurant and then guess the restaurant’s review rating based on the text.

Well, there you go. Even though there have been many people which focused in this problem, we still think that it is relevant, important, and also interesting topic to choose for this project because of numerous result of possibility which it holds, and there won’t be any exact same result in doing sentiment analysis since everyone may have their own method. As we have talked before, the future usage result of this topic is also important; the correct prediction will be a great input for the business and also for predicting the trends. Therefore, we feel challenged to also implement our own way to make the best prediction.

We think that’s a wrap for today’s post. See you!

--

--