Sentimental Analysis Technique Of Twitter Emoji

Sayan Mondal
Technology Hits
Published in
5 min readDec 17, 2019
Photo by Kelvin Yan on Unsplash

In the past decade, new forms of communication, such as micro blogging and text messaging have emerged and become ubiquitous. While there is no limit to the range of information conveyed by tweets and texts, often these short messages are used to share opinions and sentiments that people have about what is going on in the world around them.

Opinion mining (known as sentiment analysis or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation (see appraisal theory),affective state (that is to say, the emotional state of the author or speaker), or the intended emotional communication (that is to say, the emotional effect intended by the author or interlocutor).

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, “beyond polarity” sentiment classification looks, for instance, at emotional states such as “angry”, “sad”, and “happy”.

Sentiment analysis techniques are classified into two categories namely lexicon based approach and machine learning based approach.

Lexicon based approach is further divided into two category namely dictionary based and corpus based approach. In dictionary based approach, sentiment is identified using synonym and antonym from lexical dictionary like WordNet. In corpus based approach, it identifies opinion words by Considering word list. Corpus based approach further more classified as statistical and semantic approach. In statistical approach, co-occurrences of words are calculated to identify sentiment. In semantic approach, terms are represented in semantic space to discover relation between terms.

Sentiment classification techniques

Machine Learning Approach

The text classification methods using Machine learning are divided into Supervised and Unsupervised learning methods.

The supervised learning methods use a large no of training dataset. The unsupervised learning methods are used when it’s difficult to find in training dataset.

Supervised Learning

The supervised learning is depends on existence of previous labelled dataset. In next sub-section we present brief details of some used classifiers, used in the analysis.

Probabilistic Classifiers

Probabilistic classifiers or generative classifiers are considered to be among the most popular classifiers for the machine learning. This are developed by assuming generative models which are product distributions over the original attribute space (as in Naive Bayes) . It uses mixture model for classification.

Lexicon Based Approach

Semantic orientation (SO) is a measure of subjectivity and opinion in text. It usually captures an evaluative factor (positive or negative) and potency or strength .Towards a subject topic, person, or idea .Application of a lexicon is one of the two main approaches to sentiment analysis and it involves calculating the sentiment from the semantic orientation of word or phrases that occur in a text . With this approach a dictionary of positive and negative words is required, with a positive or negative sentiment value assigned to each of the words. Semantic orientation of phrases is determined as positive if it is more related to “best” and is considered to negative if it is more it’s related to “poor”. It is based on opinion lexicon.

The dictionary based approach which depends on finding option seed words and search dictionary of their synonyms and antonyms.

The corpus-based approach starts with a seed list of opinion words, and then finds other opinion words in a large corpus to help in finding opinion words with context specific orientations.

Dictionary-Based Approach

In dictionary based approach, a small set of opinion words are collected. It is English database dictionary where every term is associated with each other via link. Mostly WordNet is used to check similarity with words and to calculate sentiment score. It links to sets of syntactic category which are verb, adjective, adverb and noun.WordNet and Dictionary based approach, both are improved and add new entries (newly found word) after each iteration. It is linked with semantic relations those are termed as synonym, antonym, hyponymy, metonymy, troponymy, Entailment etc.

Corpus-Based Approach

Corpus linguistics is the study of language as expressed in corpora (samples) of “real world” text. The text-corpus method is a digestive approach that derives a set of abstract rules that govern a natural language from texts in that language, and explores how that language relates to other languages. Originally derived manually, corpora now are automatically derived from source texts. Corpus linguistics proposes that reliable language analysis is more feasible with corpora collected in the field in its natural context (“realia”), and with minimal experimental-interference.

Annotation consists of the application of a scheme to texts. Annotations may include structural markup, part-of-speech tagging, parsing, and numerous other representations.

Abstraction consists of the translation (mapping) of terms in the scheme to terms in a theoretically motivated model or dataset. Abstraction typically includes linguist-directed search but may include e.g., rule-learning for parsers.

Analysis consists of statistically probing, manipulating and generalizing from the dataset. Analysis might include statistical evaluations, optimization of rule-bases or knowledge discovery methods.

Conclusion

Twitter is a demandable micro blogging service which has been built to discover what is happening at any moment of time and anywhere in the world. In the survey, we found that social media related features can be used to predict sentiment in Twitter. We will use three machine learning algorithms which will contribute to outperform three models namely unigram, feature based model and tree kernel model by using Weka. So, our proposed system concludes the sentiments of tweets which are extracted from twitter. The difficulty increases with the nuance and complexity of opinions expressed. Product reviews, etc are relatively easy. Books, movies, art, music are more difficult. We can also implement features like emoticons, neutralization, negation handling and capitalization/internationalization as they have recently become a huge part of the internet .

Love the story ? Please support me by gifting me a Medium Membership or paypal me to continue with medium.

--

--

Sayan Mondal
Technology Hits

An avid Reader, Full Stack Application Developer, Data Science Enthusiast, and NLP specialist. Write me at sayanmondal2098@gmail.com.