Sentiment Analysis Using Pre-trained models and Transformer

TANISH SHARMA
6 min readJan 25, 2023

--

Sentiment analysis is one of the most vital tasks in natural language processing, thus it is the foremost activity in NLP and essential to extract people’s opinions from a large number of unstructured review texts and classify them into sentiment classes, i.e., positive, negative, or neutral. Understanding the social media behavior of the audience/ customers/ consumers and the opinions they posted on social media platforms(like blogs, Instagram,) about services, products, etc. That’s why it is also known as opinion mining, which refers to the use of natural language processing and text mining to identify emotional information from text materials.

For example, The sentiment for the above-mentioned image’s text is positive. There are a number of ways to appreciate the sentiments But at this juncture will look into a few techniques, python provided some libraries and packages to perform the same.

Without further ado let’s get down to……………..

  1. Text Blob: It is a simple and fairly well-performing library that is built on top of NLTK and pattern. It is not only used for sentiment analysis but executed for other tasks in natural language processing as well like Part-of-speech tagging, Noun phrase extraction, Classification (Naive Bayes, Decision Tree), Tokenization (splitting text into words and sentences), Word and phrase frequencies, Spelling correction, and many more(Know more refer to this official page). Right off just focusing on Sentiment Analysis: Sentence property or method present in textblob class(which is inherited from BaseBlob class) that returns the tuple of polarity and subjectivity. Find below lines of Code:
%pip install textblob
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer,PatternAnalyzer

if your data is more likely to be movie reviews then give a shot to NaiveBayesAnalyzer it works fairly well on movie data. By default Textblob use PatternAnalyzer, see in the following given lines of code text=” Very Good Service offered by Team.”

TextBlob(text,analyzer=PatternAnalyzer()).sentiment returns you the polarity score which is a float within the range [-1.0, 1.0]. -1 means negative sentence or 1.0 positive sentence. The subjectivity is a float value within the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

Same way we can use NaiveBayesAnalyzer which returns you the sentiment of the given text whether it is positive or negative to get the polarity of the sentence we call execute this line of code TextBlob(text,analyzer=NaiveBayesAnalyzer()).polarity

2. Vader: (Valence Aware Dictionary and sEntiment Reasoner) It is a python library focused on social media sentiment, is an open-sourced package within the Natural Language Toolkit (NLTK) and is the best pre-trained sentiment classifier for social media conversations from networks such as Facebook or Twitter. It uses a combination of A sentiment lexicon is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative. VADER not only tells us about the Positivity and Negativity score but also tells us about how positive or negative sentiment is. It’s less accurate when rating longer, structured sentences, but it’s often a good launching point.
Below is the code:

# %pip install vaderSentiment ## install the library if you don't have
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sentiment = SentimentIntensityAnalyzer()

It returns you the polarity of positive, negative, and neutral sentiments, like how positive the sentence is.

The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a ‘normalized, weighted composite score’ is accurate.

positive sentiment: compound score >= 0.05
neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
negative sentiment: compound score <= -0.05

3. Flair: It is the state-of-the-art NLP framework built on PyTorch. This pre-trained comprises popular and state-of-the-art word embeddings, such as GloVe, BERT, ELMo, Character Embeddings, etc. Flair supports a number of languages — and is always looking to add new ones.

Look at some code snippets

!pip install flair

from flair.models import TextClassifier
classifier = TextClassifier.load('en-sentiment')
# Import flair Sentence to process input text
from flair.data import Sentence
# Import accuracy_score to check performance
from sklearn.metrics import accuracy_score

text=" Very Good Service offered by Team."
sentence = Sentence(text)
classifier.predict(sentence)
score = sentence.labels[0].score
value = sentence.labels[0].value

Passing text into Sentence property and predict using a classifier that trained on distilbert(downloaded using TextClassifier.load(‘en-sentiment’)). This returns the score and snetimentent of the given Sentence.

4. Pre-trained Model: There are a number of pre-trained models that can be used for sentiment analysis. Here we use a few of them for detail please visit to this hugginface website This is a RoBERTa-base model trained on ~124M tweets from January 2018 to December 2021 and finetuned for sentiment analysis with the TweetEval benchmark. The original Twitter-based RoBERTa model can be found here and the original reference paper is TweetEval. This model is suitable for English.

from transformers import AutoModelForSequenceClassification, AutoConfig
from transformers import TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy
import numpy as np
from scipy.special import softmax
import tensorflow as tf

MODEL = f"cardiffnlp/twitter-roberta-base-sentiment-latest"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
config = AutoConfig.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)


#distilbert

model_name_d="distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = AutoTokenizer.from_pretrained(model_name_d)

config = AutoConfig.from_pretrained(model_name_d)

model = AutoModelForSequenceClassification.from_pretrained(model_name_d)


#bert

from transformers import BertTokenizer, BertForSequenceClassification,BertConfig

model_name_b='bert-base-uncased'

tokenizer = BertTokenizer.from_pretrained(model_name_b)

config = BertConfig.from_pretrained(model_name_b)

model = BertForSequenceClassification.from_pretrained(model_name_b)

def sentiment_labels(text):
encoded_input = tokenizer(text, padding=True,truncation=True,max_length=512, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
ranking = np.argsort(scores)
ranking = ranking[::-1]
return config.id2label[ranking[0]]

text=" Very Good Service offered by Team."
sentiment_labels(text)
o/p: 'positive'

5. Zero Shot Classification: Zero-Shot Classification is the task of predicting a class that wasn’t seen by the model during training. This method, which leverages a pre-trained language model, can be thought of as an instance of transfer learning which generally refers to using a model trained for one task in a different application than what it was originally trained for. This is particularly useful for situations where the amount of labeled data is small. Refer this hugging face website to know more about this.

from transformers import pipeline
classifier = pipeline(
task="zero-shot-classification",
# device=0,
model="facebook/bart-large-mnli"
)
classifier(text,["positive","negative",'neutral'],multi_class=True)

pass ‘zero-shot-classification’ as a task in the pipeline property of the transformers class and multi_class=False(if you have multiclass classification problem)

Voila!! That’s it. Happy Learning 😊

--

--