Two minutes NLP — Quick Intro to Sentiment Analysis

Real-world applications, challenges, datasets, and public pre-trained models

Published in

NLPlanet

4 min readJul 22, 2022

Expression icons created by Eucalyp on FlatIcon.

Hello fellow NLP enthusiasts! I figured out that I haven’t talked about Sentiment Analysis yet, which is a great shortcoming that I will remedy today! So, here’s a quick introduction to sentiment analysis in Python that also shows how to quickly get sentiment scores from open-source pre-trained models. Enjoy! 😄

Sentiment analysis is a Natural Language Processing (NLP) technique used to determine the sentiment of a text by automatically identifying its underlying opinions. The sentiment can be positive (e.g. “I’m very happy today”), negative (e.g. “I didn’t like that movie”), or neutral (e.g. “Today is Friday”, which may be subjectively seen as positive by some people actually 😁).

Applications of sentiment analysis

Sentiment analysis has several applications, such as:

Understanding customer sentiment in social media, product reviews, and survey responses to find out what customers think about your products and services, and to make improvements accordingly.
Automatically generating product recommendations based on users' sentiment towards them.
Identifying influencers who have a positive or negative influence on public opinion and who may be relevant for advertising your products.
Tracking the sentiment of a brand or product over time to improve the brand or adjust marketing efforts. This type of analysis can be done on competitors as well.
Monitoring employee morale. This information can be useful to managers as it helps them identify problem areas that may need to be addressed. It can also help them see how employees are responding to changes within the company, such as new policies or initiatives.

Challenges of sentiment analysis

Sentiment analysis may seem easy, but there are several challenges to it due to the nature of human language:

Subjective interpretability: The same text can be interpreted in different ways by different people. One person may interpret a sentence as being positive while another person may interpret it as being negative.
Informal language and context information: Informal language can be interpreted in different ways. For example, the phrase “I’m not happy” could be interpreted as either positive (e.g. if followed by “I’m super happy”) or negative sentiment (e.g. if followed by “I’m sad”). Words can have different sentiment depending on the context. For example, the word “sad” can have a positive sentiment if the person is talking about a sad movie that they enjoyed, or a negative sentiment if the person is talking about a sad event in their life.
Managing sarcasm: Sarcasm is often used in a negative way, to make a point or to express frustration. However, it can also be used in a positive way, to show support or to make a joke. The problem with sarcasm is that it can be difficult to detect, as the meaning is often hidden behind the words.

How to train a sentiment analysis model

To train a sentiment analysis model, you will need a labeled dataset with sentiment annotations. There are many publicly available datasets that you can use, or you can create your own by labeling a dataset yourself. Examples of public datasets are:

SST (Stanford Sentiment Treebank): It consists of 11,855 single sentences extracted from movie reviews. Each phrase is labeled as either negative, somewhat negative, neutral, somewhat positive, or positive.
Large Movie Review Dataset: A dataset for binary sentiment classification containing a set of 25,000 highly polar movie reviews for training and 25,000 for testing.

Once you have your dataset, you will need to choose a machine learning algorithm to train your model (e.g. classification or regression models from scikit-learn). There are many different algorithms that can be used for sentiment analysis, so you will need to experiment to find the one that works best for your data. Here’s a tutorial to train your own sentiment analysis model with scikit-learn.

After training your model, you can test it on new data to see how well it performs. Once you are satisfied with the results, you can deploy your model to be used in a real-world application.

Public pre-trained models

Pre-trained machine learning models can be used with success if the data you’re going to predict on is very similar to the data the model has been trained on. For example, if you want to predict the sentiment on English tweets, you’ll need a model trained on English tweets (or over tweets in a multitude of languages containing English). If you want to predict the sentiment on product reviews, you’ll need a model trained on product reviews.

You can find a lot of public pre-trained models for sentiment analysis on Hugging Face.

A good sentiment analysis model for tweets is twitter-xlm-roberta-base-sentiment , which works in eight languages (Ar, En, Fr, De, Hi, It, Sp, Pt). Visiting its page you’ll see code snippets that show how to use it.

First, we need to install the transformers library from Hugging Face, along with the SentencePiece tokenizer which is used by our model.

Then, we’ll use the following code (very similar to the code snippet provided by Hugging Face) to extract the sentiment from two sentences.

The results are:

The sentence I really liked the video! has Positive sentiment with a score ~0.935.
The sentence I'm not happy with the results. has Negative sentiment with a score ~0.901.

Next steps:

Possible next steps are:

Try other sentiment analysis models from Hugging Face.
Train your own sentiment analysis model following this tutorial.

Thank you for reading! If you are interested in learning more about NLP and Data Science, remember to follow NLPlanet on Medium, LinkedIn, Twitter, and join our new Discord server!