Turney Lexicon

Mohamad Mahmood
Lexiconia
Published in
4 min readJul 17, 2024

Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews

The Turney lexicon is a lexical resource created by Peter Turney in 2002. It is a list of English words paired with their semantic orientation, either positive or negative. The lexicon was created by automatically extracting words and their semantic orientation from reviews and product descriptions on the web.

Some key characteristics of the Turney lexicon:

  • Automatically generated from web data, rather than manually curated
  • Provides a numerical sentiment score for each word, not just binary positive/negative labels
  • Covers a broad range of common English vocabulary
  • Has been used extensively in academic research and real-world applications

Abstract

This paper presents a simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (thumbs down). The classification of a review is predicted by the average semantic orientation of the phrases in the review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has good associations (e.g., “subtle nuances”) and a negative semantic orientation when it has bad associations (e.g., “very cavalier”). In this paper, the semantic orientation of a phrase is calculated as the mutual information between the given phrase and the word “excellent” minus the mutual information between the given phrase and the word “poor”. A review is classified as recommended if the average semantic orientation of its phrases is positive. The algorithm achieves an average accuracy of 74% when evaluated on 410 reviews from Epinions, sampled from four different domains (reviews of automobiles, banks, movies, and travel destinations). The accuracy ranges from 84% for automobile reviews to 66% for movie reviews.

Thumbs up or thumbs down? | Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (acm.org)

Turney’s Algorithm

The main idea is to work on sets of two or three words instead of working on single words as shown in the previous algorithms. These sets of words are compared with two keywords: “excellent” and “poor”. The comparison is made through some statistical indicators that will be explained later. The reference words “excellent” and “poor” were chosen because, in the five-star review rating system, it is common to define one star as “poor” and five stars as “excellent”. During the development, other keywords have been tried (e.g. “great” instead of “excellent”) but the best results have been obtained with the words selected by professor Turney.
https://www.skenz.it/compilers/nlp_sentiment_analysis

Turney’s Algorithm

“… implemented Turney the unsupervised classified algorithm for thumbs up and thumbs down semantic orientation. He conducted experiments with 410 reviews of E-opinions which gives an average accuracy of 74%. For movie reviews it is difficult to predict and it produces an average accuracy of about 66%. 80% to 84% of average accuracy is achieved for travel reviews.”
https://www.sciencedirect.com/science/article/pii/S1877050920300466

Turney’s Method

Turney et al.9 present a simple unsupervised learning algorithm for classifying a review. The algorithm takes a written review as input and produces a speech classification as output.

The first step is to use a part-of-tagger to identify phrases in the input text that contain adjectives or adverbs.

The second step is to estimate the semantic orientation of each extracted phrase.

The third step is to assign the given review to a class, recommended or not recommended, based on the average semantic orientation of the phrases extracted from the review. If the average is positive, the prediction is that the review recommends the item it discusses.

Two consecutive words are extracted from the review if their tags conform to any of the patterns in Fig. 2. The JJ tags indicate adjectives, the NN tags are nouns, the RB tags are adverbs, and the VB tags are verbs. The second pattern, for example, means that two consecutive words are extracted if the first word is an adverb and the second word is an adjective, but the third word (which is not extracted) cannot be a noun. NNP and NNPS (singular and plural proper nouns) are avoided, so that the names of the items in the review cannot influence the classification. The second step is to estimate the semantic orientation of the extracted phrases, using the PMI-IR algorithm. This algorithm uses mutual information as a measure of the strength of semantic association between two words.
https://www.sciencedirect.com/science/article/pii/S1877050915013538

Measuring praise and criticism: Inference of semantic orientation from association

The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., “honest”, “intrepid”) and negative semantic orientation indicates criticism (e.g., “disturbing”, “superfluous”). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots).
https://dl.acm.org/doi/10.1145/944012.944013

🤓

--

--

Mohamad Mahmood
Lexiconia

Programming (Mobile, Web, Database and Machine Learning). Studies at the Center For Artificial Intelligence Technology (CAIT), FTSM, UKM, Malaysia.