Working on NLP with TextBlob
TextBlob is a Python Library for working with Text data. It provides simple API to dive into various Natural Language Processing Tasks.
Let us work on some NLP tasks using TextBlob. We shall work on some sentiment analysis.
We use the Trip Advisor reviews dataset, it has abourt 20,000 reviews with 1–5 star ratings on them.
#importing the libraries
import os
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from textblob import TextBlob
import nltk
from nltk.stem import WordNetLemmatizer
import matplotlib.pyplot as plt
%matplotlib inline#reading the data
df = pd.read_csv("/kaggle/input/trip-advisor-hotel-reviews/tripadvisor_hotel_reviews.csv")
We start by importing the libraries and importing the data.
df.head()
Now, we look at how many times each data point is appearing.
print("Star reviews and number of times they occur.")
df["Rating"].value_counts()
Then there is a lot of stuff with normal NLTK, creating a frequency table, wordcloud etc. Have a look at the Kaggle Notebook for the entire code. (Link will be given below.)
Using TextBlob
We will use the in-built methods in TextBlob to generate review polarity and subjectivity.
df_arr = df.to_numpy()
But, before working with creating a classifier, we need to pass the data to Textblob functions, so we will pass the data into a python 2D list.
for a in df_arr:
text=a[0]
testimonial = TextBlob(text)
testimonial.sentiment
polarity_arr.append(testimonial.sentiment.polarity)
subjectivity_arr.append(testimonial.sentiment.subjectivity)
Hence, we get the values for polarity and subjectivity. To consolidate the data, we pass them to our dataframe as well.
df["Review_Polarity"]=polarity_arr
df["Review_Subjectivity"]=subjectivity_arr
df.head(3)
Understanding the features of the new data. We shall be using distribution plots for that.
plt.figure(figsize=(15,8))
sns.distplot(df["Review_Polarity"])
plt.figure(figsize=(15,8))
sns.distplot(df["Review_Subjectivity"])
Now, before we get into working on the model, a few words. TextBlob library has many predefined functions, which make the works very easy. But it also means that, working on TextBlob needs high computational resources. I was not able to train the model on all 20,000 data points. So, went with only 1000 data points.
from textblob.classifiers import NaiveBayesClassifierdf_model=df_arr[0:1000]cl = NaiveBayesClassifier(df_model)
Now, the model is trained, we can go and test on some sample text.
cl.classify("The hotel is very good. Food was good, housekeeping could have been better. The staff was ok")
test_text=".before stay hotel arrange car service price 53 tip reasonable driver waiting arrival.checkin easy downside room picked 2 person jacuzi tub no bath accessories salts bubble bath did n't stay, night got 12/1a checked voucher bottle champagne nice gesture fish waiting room, impression room huge open space felt room big, tv far away bed chore change channel, ipod dock broken disappointing.in morning way asked desk check thermostat said 65f "cl.classify(test_text)
So we can say that the classifier has been doing well. With better computational resources, model can be made to do better.
Have a look at the entire code.
Thank You.
My Linkedin Profile -
My Github Profile -
THANK YOU.