Sentiment Analysis with Pandas and IBM Watson

Published in

IBM Data Science in Practice

6 min readJun 7, 2021

In this article, we will show how to perform an example sentiment analysis task using Watson Natural Language Understanding and our open source library Text Extensions for Pandas.

This article was written in collaboration with Fred Reiss.

With the significant growth in the volume of highly subjective user-generated text in the form of online product reviews, recommendations, blogs, discussion forums, as well as many other types of text data, sentiment analysis has gained a lot of attention. The goal of sentiment analysis is to automatically detect the way the author of the text feels towards the entity of interest. While sentiment analysis is one of the most prominent and commonly used natural language processing (NLP) features, it is typically used in combination with other NLP features and text analytics to gain insight into the user experience for the sake of customer care and feedback analytics, product analytics and brand intelligence.

This article shows how the open-source library Text Extensions for Pandas lets you turn the Watson Natural Language Understanding service output to Pandas DataFrames. Then you can easily conduct data preprocessing and exploratory sentiment analysis over the resulting Pandas DataFrames using Pandas and scikit-learn. This facilitates following the common workflow for data science which includes first conducting the data preprocessing and the exploratory data analysis using Pandas and then applying machine learning models using scikit-learn.

We start with the Edmunds-Consumer Car Ratings and Reviews dataset from Kaggle.com. This is a dataset containing consumers’ thoughts and the star rating of cars; each review is annotated with the car’s manufacturer/model/type. Here is how the records in our dataset look like:

Use IBM Watson to identify the sentiment of product reviews

IBM Watson Natural Language Understanding includes a method to extract the keywords and their corresponding sentiment and emotion for each of the product reviews. We pass each review to the Watson Natural Language Understanding (NLU) service:

and we get the following raw output for our sample review:

{'usage': {'text_units': 1, 'text_characters': 141, 'features': 1},
 'language': 'en',
 'keywords': [{'text': 'H1 Review',
   'sentiment': {'score': 0.970854, 'label': 'positive'},
   'relevance': 0.964178,
   'emotion': {'sadness': 0.031339,
    'joy': 0.721719,
    'fear': 0.01458,
    'disgust': 0.024451,
    'anger': 0.013115},
   'count': 1},
  {'text': 'long history',
   'sentiment': {'score': 0, 'label': 'neutral'},
   'relevance': 0.925094,
   'emotion': {'sadness': 0.279594,
    'joy': 0.218622,
    'fear': 0.16138,
    'disgust': 0.055864,
    'anger': 0.021514},
   'count': 1},
  {'text': 'abilities of this truck',
   'sentiment': {'score': 0, 'label': 'neutral'},
   'relevance': 0.720289,
   'emotion': {'sadness': 0.251278,
    'joy': 0.122194,
    'fear': 0.094499,
    'disgust': 0.080733,
    'anger': 0.106156},
   'count': 1},
  {'text': 'vehicles',
   'sentiment': {'score': -0.674565, 'label': 'negative'},
   'relevance': 0.681637,
   'emotion': {'sadness': 0.07538,
    'joy': 0.033391,
    'fear': 0.185066,
    'disgust': 0.019377,
    'anger': 0.020686},
   'count': 1},
  {'text': 'truck',
   'sentiment': {'score': 0.970854, 'label': 'positive'},
   'relevance': 0.620892,
   'emotion': {'sadness': 0.069742,
    'joy': 0.565564,
    'fear': 0.031214,
    'disgust': 0.037615,
    'anger': 0.027269},
   'count': 1},
  {'text': 'road',
   'sentiment': {'score': 0, 'label': 'neutral'},
   'relevance': 0.571698,
   'emotion': {'sadness': 0.251278,
    'joy': 0.122194,
    'fear': 0.094499,
    'disgust': 0.080733,
    'anger': 0.106156},
   'count': 1}],
 'analyzed_text': 'H1 Review: The truck is incredible.  I have a long history of \r4x4 vehicles and nothing comes close to the \r abilities of this truck off road.'}

The output format is a bit hard to read. What if we could get the results in Pandas DataFrames format here? It certainly would make our life easier as we could easily aggregate the sentiment and emotion scores for each review and then pass the DataFrames to our machine learning library. And this is where Text Extensions for Pandas comes into the picture. In our previous article, we showed how the open-source library, Text Extensions for Pandas, can convert the output of the Semantic Roles Watson Model into a Pandas DataFrame. Here we can apply the same conversion to the Watson NLU response:

That’s it. With the DataFrame version of this data, we can perform our exploratory and sentiment analysis task easily with few lines of code. Specifically, we use Pandas to concatenate the Watson NLU sentiments DataFrames (output of Text Extensions for Pandas) with its corresponding review, and then we conduct some exploratory analysis on the data later in this article.

Then we merge each review in the resulting DataFrames with its Title, Author, Rating, and other info as below and then group based on the Review_Title column:

It is worth mentioning that Watson NLU assigns the sentiment to the keywords based on their context within the sentence. Hence, all keywords within one sentence get the same sentiment score. Thus, to get the aggregated sentiment of each review we calculate the mean sentiment score of its sentences by considering the sentiment assigned to one keyword in each sentence. More specifically, we first drop duplicate sentiment scores for each review, and then we calculate the average sentiment and emotion score for each review:

Now we can find the Pearson correlation coefficients of these variables:

As the table above shows, there is an association between the review’s Ratings and the Watson NLU sentiment score and joy emotion but repulsion between the review’s Ratings and sadness emotion. The results also demonstrate the strong positive correlation between Watson NLU sentiment score and Watson NLU joy emotion. On the contrary, there is a strong negative correlation between the sadness emotion and the sentiment score, as expected. Since the sentiment.score field shows a relatively high correlation with the Rating, let’s plot these values together to see how they correlate:

Gradient Boosting

Now let’s see how we can use the sentiment score along with the fine-grained sentiment scores extracted by Watson NLU to predict the user’s rating. Let’s try the Gradient Boosting here. To do so, we first determine the input features:

Now let’s split the DataFrames into training and testing sets and then train a Gradient Boosting model on the training set. We first need to create an instance of the Gradient Boosting model from scikit-learn. After fitting the model, we can make predictions by calling the predict command on the testing set. We’ll now check the predictions against the actual values by using the mean squared error (MSE) and coefficient of determination metrics, the two metrics commonly used to evaluate regression tasks:

Finally, we can plot the predicted rating against the actual rating:

Further Evaluations

Let’s see how well the model fits the data when it comes to the prediction of the average rating of each make of cars. For that we need to keep the Car_Make in our dataset DataFrames, fit the Grading Boosting on individual reviews, and then calculate the average mean squared error and R-squared in the Car_Make level. By doing so, we will get the coefficient of determination of 0.81 and the Mean Squared Error of 0.02 showing the model has fitted the data very well when it comes to predicting the average rating of each make of cars. Indeed, the obtained coefficient of determination suggest a clear better fit for the model on average; showing that the Gradient Boosting Regressor model explains 81% of the fitted Car Make level Rating in the regression model and only 19% of the variability in the Rating cannot be explained by the model. The scatter plot of mean rating predicted by the gradient boosting model against the human rating shows how the predicted values are very close to the observed data values:

In this article, we demonstrated how Text Extensions for Pandas can be used to perform an example Sentiment Analysis task. We started by loading the car reviews and passing them through the Watson NLU service. We extracted the keywords and their corresponding sentiment and fine-grained emotion using the Watson NLU service. We used Text Extensions for Pandas to convert the Watson NLU output to Pandas DataFrames and calculated the review-level sentiment and emotion. Using the resulting Pandas DataFrames, we showed the correlation of Watson NLU’s extracted features and user’s Rating first and then developed a Gradient Boosting model for predicting the Ratings for a given review. Finally, we evaluated the ability of the model for predicting the average rating for each make of car.

This article shows how easy it is to use IBM Watson NLU, Pandas, and scikit-learn together to conduct exploratory analysis or prediction on your own data.

Sentiment Analysis with Pandas and IBM Watson

Use IBM Watson to identify the sentiment of product reviews

Gradient Boosting

Further Evaluations

Written by Monireh Ebrahimi