Sentiment Analysis Tools Overview, Part 1. Positive and Negative Words Databases

Data Monsters
Jul 13, 2017 · 5 min read

by Olga Davydova

Sentiment analysis tools rely on lists of words and phrases with positive and negative connotations. Many dictionaries of positive and negative opinion words were already developed. In this paper, we will look at most known words databases.

Liu and Hu opinion lexicon contains around 6800 positive and negative opinion words or sentiment words for English language [1]. This list was composed over many years.


SentiWordNet is a lexical resource for opinion mining that assigns to each synset of WordNet three sentiment scores: positivity, negativity, and objectivity [2]. It has a Web-based graphical user interface, and it is freely available for research purposes. The development of the resource is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectoral term representations for semi-supervised synset classification. Positivity, negativity, and objectivity are derived by combining the results produced by a committee of eight ternary classifiers [3].

Natural Language Processing (SentiWords)

The next resource containing roughly 155,000 English words is SentiWords [4]. Words are associated with a sentiment score included between -1 and 1. Words are in the form lemma#PoS and are aligned with WordNet lists that include adjectives, nouns, verbs and adverbs.

AFINN is a manually labeled by Finn Årup Nielsen in 2009–2011 list of English words rated for valence with an integer between minus five (negative) and plus five (positive) [5]. It is possible to try it on this site [6].


The WordStat Sentiment Dictionary includes more than 9164 negative and 4847 positive word patterns, but sentiment is not measured with those two lists [7]. Negative sentiment is measured by using the following two rules instead. The first rule is negative words not preceded by a negation within three words in the same sentence. The second rule is positive words preceded by a negation within three words in the same sentence. The rules for the positive sentiment are the same: positive words not preceded by a negation as well as negative terms following a negation.


SenticNet provides polarity associated with 50,000 natural language concepts [8]. A polarity is a floating number between -1 and +1. Minus one is extreme negativity, and plus one is extreme positivity. The knowledge base is free. It can be downloaded as XML file. Its latest version is also accessible as an API.

The Affective Norms for English Words (ANEW) is a set of normative emotional words ratings in terms of pleasure, arousal, and dominance [9]. It was developed to create a standard for use in studies of emotion and attention.

The Whissell Dictionary of Affect in Language

The Whissell Dictionary of Affect in Language is a freeware software for the statistical analysis of individual words, not according to their meaning, but the way they ‘feel’ [10]. Including 348,000 words, it covers 90% of spoken English. Scores of volunteers rated words according to how pleasant they felt, how active they seemed, and how well the word brought an image to mind.


In Pattern, written text is categorized into two types firstly: facts and opinions [11]. Opinions can express people’s sentiments toward the word. The package has the sentiment.xml, which includes 2888 words scored for polarity, subjectivity, intensity and reliability. The words are mostly adjectives. There are two useful functions: sentiment() and positive() . The sentiment() function returns a (polarity, subjectivity)-tuple, an ordered set of values, for the given sentence. A polarity is a value between -1.0 and +1.0 and subjectivity between 0.0 and 1.0. The positive() function returns True if the given sentence’s polarity is above the threshold.

The Sentiment140 Lexicon was created from the Sentiment140 emoticon corpus of 1.6 million tweets and contains a list of words and their associations with positive and negative sentiment [12].

Linguistic Inquiry and Word Count (LIWC)

Linguistic Inquiry and Word Count (LIWC) is a computer program for language analysis [13]. It supports English, German, Spanish, Dutch, and Italian. This commercial word list lets you extract around 60 different word categories, including positive emotions, negative emotions, aggression, affective processes, anxiety, anger, profanities, and so on.

The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon

The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is a list of subjectivity clues that is part of OpinionFinder [14]. It helps to determine text polarity.

The list on this page contains more than 6000 positive words and phrases making it one of the longest and best list of positive words [15].

Other examples of databases are positive words list [16] and negative words list [17].


The most popular positive and negative words databases that can help to perform sentiment analysis were described: Liu and Hu opinion lexicon, SentiWordNet, SentiWords, AFINN, WordStat Sentiment Dictionary, SenticNet, the Affective Norms for English Words, the Whissell Dictionary of Affect in Language, Pattern, Sentiment140 Lexicon, Linguistic Inquiry and Word Count, the MPQA Subjectivity Lexicon.



















Data Monsters

Written by