by Olga Davydova
Information contained in this article has been mostly obtained from sentiment analysis tools’ official websites.
The most common application of sentiment analysis is in consumer products and services reviews. The main task of sentiment analysis is to determine whether the text expresses a positive or a negative sentiment and to assign it a polarity value. The goal of this article is to review the most known sentiment analysis tools. We will cover lexicon-based analysis methods, rule-based analysis methods, and machine learning techniques.
This Natural Language Toolkit (NLTK), a Sentiment Analysis tool, is based on machine learning approaches. The module nltk.sentiment.util contains a special function: demo_liu_hu_lexicon(sentence, plot=False). The function counts the number of positive, negative and neutral words in the input and classifies them depending on which polarity is most frequently represented . Words that do not appear in the lexicon are considered as neutral. This is an example of sentiment classification using Liu and Hu opinion lexicon. The Liu and Hu opinion lexicon is a list of positive and negative words.
Many sentiment analysis tools rely on lists of words and phrases with positive and negative connotations. Many lists are already available. You can read about the most known lists in the previous article.
Another useful function is demo_vader_instance(text). It returns polarity scores for a text using VADER approach.
Pattern is a web mining module for the Python programming language . It has many tools for data mining including sentiment analysis tools. Input text that can be a string, text, sentence, chunk, word or a synset (a set of one or more synonyms), is divided into two types: facts and opinions. Opinions carry people’s sentiments. The module has several useful functions:
The sentiment() function returns a (polarity, subjectivity) tuple, an ordered set of values, for the given sentence, based on the adjectives it contains, where polarity is a value between -1.0 and +1.0 and subjectivity between 0.0 and 1.0.
The positive() function returns True if the given sentence’s polarity is above the threshold.
TextBlob is another text processing Python library. The sentiment property returns a named tuple of the form Sentiment (polarity, subjectivity). The polarity score is a float within the range [-1.0, 1.0]. The subjectivity is a float within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective .
Many sentiment analysis tools lose important information by using a lexicon-based approach. A new deep learning model used in Stanford CoreNLP makes it possible to compute the sentiment based on how individual words change the meaning of longer phrases. A new type of recursive neural network that builds on grammatical structures was created, and a sentiment treebank was developed. The sentiment treebank includes fine-grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences. To address them use the Recursive Neural Tensor Network . The code is written in Java. It is possible to use Stanford CoreNLP from the command-line, via its Java programmatic API, via third party APIs for most major modern programming languages, or via a web service. It works on Linux, macOS, and Windows.
RapidMiner allows users to extract information from publicly available data sources that lets them understand and optimize what is being said about their company or products. It is possible to crowd-source your customers’ feedback, conduct social market research, determine the polarity of product reviews, and hold discussions around the web . The RapidMiner Studio offers free edition, which is limited to 1 logical processor and 10,000 data rows. Commercial pricing starts at $2,500 and is available from the developer.
General Architecture for Text Engineering (GATE)
General Architecture for Text Engineering (GATE) is a Java open-source, natural language processing tool developed at the University of Sheffield in 1995. It contains several sentiment analysis systems. One of them uses supervised machine learning methods, trained on human-annotated data, co-occurrence statistics, and lexicons of positive and negative words . Others use lightly-supervised and unsupervised approaches. Developers also consider richer user-sensitive models of opinions to reflect the fact that the same opinions could be positive for one user group and negative for another. Author authority models and temporal elements are used that make it possible to detect shifts in attitudes over time.
R (Package: Syuzhet)
Syuzhet extracts sentiment and sentiment-derived plot arcs from text using four sentiment dictionaries . Its get_sentiment function assesses the sentiment of each word or sentence. The function takes two arguments: a character vector of sentences or words and a method. The method determines which sentiment extraction methods to use. After the sentiment values are determined, it is possible to obtain a measure of the overall emotional valence in the text by summing the determined values .
R (Package: RSentiment)
The RSentiment package lets users analyze the sentiment of a sentence and assign a score to it. A special function calculate_sentiment predicts the sentiment of sentences. Sentences can be classified into six categories: positive, negative, very positive, very negative, sarcasm and neutral .
R (Package: Sentiment Analysis)
SentimentAnalysis uses various existing dictionaries of positive and negative words and can create customized dictionaries using the generateDictionary() function. The latter uses LASSO regularization as a statistical approach to select relevant terms based on an exogenous response variable . The function analyzeSentiment() returns sentiment scores from contents stored in different formats. ConvertToBinary() and convertToDirection() functions convert the continuous scores to either binary sentiment classes (negative or positive) or tertiary directions (negative, neutral or positive). CompareToResponse() function performs a statistical evaluation, plotSentimentResponse() — enables a visual comparison.
Google Cloud Prediction API
Google Cloud Prediction API provides a representational state transfer (RESTful) API to build machine learning models. It helps to analyze a text string and classify it with one of the labels that the user provides. Labels could be positive and negative, or happy, frustrated, or sad . After collecting training samples, users need to pre-classify each sample with a label then upload the data to Google Cloud Storage. Use the prediction.trainedmodels.insert() method to train the model and make predictions with the Google Prediction API in your application. If needed, you can improve your model.
Sentiment Classifier using Word Sense Disambiguation (WSD)
Sentiment Classifier uses Word Sense Disambiguation (WSD) drawing on WordNet and word occurrence statistics from the movie review corpus NLTK. WSD determines which meaning of a word is used in a sentence if the word has more than one meaning. As features, bigrams on Naive Bayes and maximum entropy classifier are used. The sentiment analysis tool classifies input into positive or negative labels .
IBM Watson Natural Language Understanding
Watson enables data scientists to analyze the sentiment of a specific target phrases or the sentiment of the whole document. You can also obtain sentiment information for detected entities and keywords . A deeper analysis of the text permits the determination of a positive tone while a customer review may have an overall negative sentiment. The Sentiment object has a score property for the concept ranging from -1 to 1. Negative scores indicate negative sentiments, and positive scores indicate positive sentiments. The tool supports Arabic, English, French, German, Italian, Portuguese, Russian, and Spanish. It has Node SDK, Java SDK, Python SDK, iOS SDK, and Unity SDK.
SentiGrade and SentеМotion
SentiMetrix offers cutting-edge data science and social analytics solutions like SentiGrade and SentеМotion to address the “big data” needs of customers who have their own analytical and visualization solutions that could benefit from accurate sentiment scores extracted from the customers’ data . SentiGrade has an API that opens programmatic access to an automatic Sentiment Analysis engine. The system supports multiple languages. SentiMetrix also developed SentеМotion, emotion-based search engine that tracks the intensity of different emotions in any given document.
Some popular sentiment analysis tools and methods were presented above. All these tools make it possible to determine the sentiment of texts, which can be positive or negative. A number of tools classify text into additional categories: very positive, very negative, sarcastic, neutral, happy, frustrated, and sad. We described several Python libraries including NLTK, Pattern, TextBlob, Sentiment Classifier, some R packages like Syuzhet, RSentiment, Sentiment Analysis, and other tools like IBM Watson Natural Language Understanding, Stanford CoreNLP, RapidMiner, GATE, Google Prediction API, SentiGrade and SentеМotion.