Your Guide to Sentiment Analysis
Sentiment Analysis is used to discover people’s opinions, emotions and feelings about a product or service.
In theory it is a computational study of opinions, sentiments, attitudes, views, emotions etc. expressed in text. This text can be in a variety of formats like Reviews, Blogs, News or Comments.
The ability to extract insights from this type of data is a practice that is widely adopted by many organisations across the world. Its application are broad and powerful.
Sentiment analysis has many names — Opinion Mining, Sentiment Mining, and Subjectivity Analysis.
So. Many. Different. Names!
We use it for many different tasks, for instance:
- Is that movie worth watching ?
- What do people think about the latest iPhone?
- What do people think about the elections, particular candidates or issues?
If we can gauge these sentiments well, we can use them to make interesting forecasts about a variety of things, like stock market trends, election outcomes and so on.
Easier and Harder Problems in Sentiment Analysis
The application of Sentiment Analysis is broad. Some problems are relatively easy and some are more complex.
We can broadly divide the problems in sentiment analysis into 3 categories :
- Text Polarity refers to identifying sentiment orientation of the text
- Sentiment rating gives a numerical rank, say from 1 to 5 for text
- Aspect based Sentiment Analysis determines the sentiment towards specific aspects in text.
However, since the application of Sentiment Analysis is wide-ranging, some problems are easy, while some are more complex.
Breaking down Sentiment Analysis
Text Polarity
Here we are more interested in the overall orientation of the text i.e. whether it is positive or negative. Due to the free structure of text, there are many tricky cases which are hard to solve .
For example, is this a positive or negative review?
- Easy: “I bought iPhone XR a few days ago. It is such a nice phone, although a little large and pricy. The camera is awesome!”
- Tricky : “Honda Accords and Toyota Camrys are nice sedans, but hardly the best car on the road”
Rank the sentiment of the text (say 1 to 5)
Ranking the sentiment of the text provides more granularity. Here, I want to assign a numerical score to the sentiment within a range.
For example, we can have a model that assigns a rating to Amazon reviews, from 1 to 5 , with 1 being very negative and 5 being very positive.
Aspect Based Sentiment Analysis
Sometimes, it’s not enough to say whether a text has a “positive” or a “negative” sentiment. Maybe you want to know more than whether people are talking with a positive, neutral, or negative polarity about the product, but also which particular aspects or features of the product people talk about.
Broad Methods for doing Sentiment Analysis
You can use both supervised learning and unsupervised learning methods for your sentiment analysis.
Sentiment Analysis Using Supervised learning
Supervised learning for a sentiment model can be thought of as training a classifier.
Supervised learning based Sentiment Analysis consists of two steps, step one is learning or training and step two is testing.
Sentiment Analysis using unsupervised learning
While machine learning methods are widely used in sentiment analysis, there are also many systems adopting unsupervised learning methods. These methods require sentiment dictionaries.
There are two methods you can use to create a sentiment dictionary. One approach is based on human curated sources. For example, we have sentiment dictionaries like SentiWordnet and Subjectivity Cues lexicon. Or you can use a corpus based approach where specialised sentiment dictionaries are built from the corpus of text.
With lexicons, either clustering or scoring algorithms are used to determine sentiment of the input text. Lexicon-based sentiment analysis systems are hard to develop. High quality lexical resources are the key to good performance. Usually they do not give domain or context dependent meaning. To overcome these issues, use a corpus-based approach. In general, corpus-based approaches rely on syntactic or co-occurrence patterns in large corpuses. You will also need a large corpus to get good coverage.
Some Useful Tips
Apply linguistic and statistical methods to the analysis task
Meaning varies according to word sense, context, and what’s being discussed. We need to look for methods that apply linguistic and statistical methods to the analysis task. Sentiment Analysis via term look-up in a lexicon is an easy but crude method.
Train Domain Adapted Models
Whether you apply language engineering, statistical methods, or machine learning to the task, properly trained domain-adapted models will outperform generic classification.
Aspect Based Sentiment Analysis
Document-level sentiment analysis is largely passé. Aim for sentiment resolution at the entity, concept, or topic level.
Other Sources of Sentiment Analysis
Text is the most common sentiment data source, but it’s not the only one. Biometrics, images and sound are also sources of sentiment data.
Affective computing
In broader terms, Sentiment Analysis is part of the world of affective computing. Affective computing is the study and development of systems and devices that can recognise, interpret, process, and simulate human effects. It is an interdisciplinary field spanning computer science, psychology, and cognitive science.
Is it Accurate?
We need to be beware of accuracy claims. The biggest threat to accuracy in Sentiment Analysis today is human concordance: this is the degree of agreement among humans (or between humans and machines). Numerous studies have shown that the rate of human concordance is between 70% and 80%. There are many factors affecting the performance of a sentiment analysis system.
Contextual understanding
Contextual understanding is critical for a system to be able to reach human-level accuracy.
For example: “I am craving McDonald’s so bad”.
Most systems will misinterpret this statement as negative by seeing the the phrase “so bad”.
Sentiment Ambiguity
“Can you recommend any good holiday destinations?”
This statement doesn’t express any sentiment, although it uses the positive sentiment word “good”
Sarcasm
“Sure, I’m happy for my browser to crash right in the middle of my coursework.”
Obviously, this statement is negative, even though it has the positive word “happy”.
Comparatives
“Iphone is much better than Samsung.”
Most Sentiment analyser tools cannot “pick sides” when they find comparative statements like the one mentioned here, they can only pick the sentiment based on keywords. So, this example would be tagged as “positive” as it contains the positive keyword : “much better”, regardless of which company is looking at this data.
I hope this post has helped you to understand the different types of Sentiment Analysis as well as its limitations. Add a comment below if you have any questions.