Sentiment analytics of tweets

Sarka Pribylova
4 min readApr 3, 2022

--

It has been approximately 1 year ago since I created the basic text sentiment analytics of Twitter tweets in Python. Let me share few lines of code and results.

There are many areas, how sentiment analytics could bring better business performance. Python offers various approaches to sentiment and polarity. We can examine webpages, stocks, libraries, books or twitter feed and see e.g. how positive, negative or neutral were online texts about UBS Bank. Lets try to understand e.g. the lastest 100 tweets about UBS, they speak a lot about fintech, future, challenge, sharing and innovation as well. 72% Tweets about UBS are positive and only 1% is negative.

We use the python Tweepy library for authentcation, textblob library for simple NLP procedures, WordCloud library to create a WordCloud graph.

1.

# install and import librariespip install tweepy
import tweepy
pip install TextBlob
from textblob import TextBlob
pip install wordcloud
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import csv
import datetime

2.

To use the Twitter information we must first create Twitter developers account , log into the Twitter developers account and after that generate Twitter API authentication. Without this step we cannot use online Twitter data in the next step of the code: https://docs.tweepy.org/en/latest/authentication.html

# log in to developers accountconsumer_key = "xxx"
consumer_secret = "xxx"
access_token = "xxx"
access_token_secret = "xxx"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

3.

Now we will screen the latest 100 Twitter posts, which contain UBS word. We collect these 100 Tweets and save them into the df data frame.

# download and clean textposts=api.user_timeline(screen_name="UBS", lang="en", count=100, tweet_mode="extended")
for tweet in posts[0:5]:
print(tweet.full_text + '\n')
print("Show results \n")
i=1
for tweet in posts[0:5]:
print(str(i)+ ') '+ tweet.full_text + '\n')
i=i+1
df= pd.DataFrame([tweet.full_text for tweet in posts], columns=['Tweets'])
df.head()
def cleanTxt(text):
text=re.sub(r'@[A-Za-z0–9]+','',text)
text=re.sub(r'#','',text)
text=re.sub(r'https','',text)
return text
df['Tweets']=df['Tweets'].apply(cleanTxt)
df

4.

Lets generate 2 new columns in existing data frame: subjectivity and polarity. These 2 new metrics belong to the text classification of NLP topics. We use here already installed TextBlob Library. https://textblob.readthedocs.io/en/dev/

# calculate scoredef getSubjectivity(text):
return TextBlob(text).sentiment.subjectivity
def getPolarity(text):
return TextBlob(text).sentiment.polarity
df['Subjectivity']=df['Tweets'].apply(getSubjectivity)
df['Polarity']=df['Tweets'].apply(getPolarity)
df

5.

Lets print the Wordcloud from data frame.

# print wordcloudallWords= ' '.join([twts for twts in df['Tweets']])
wordCloud= WordCloud(width=500, height=300,random_state=21,max_font_size=110,background_color="white", colormap="binary").generate(allWords)
plt.imshow(wordCloud,interpolation="bilinear" )
plt.axis('off')
plt.show()

6.

We can create another new column in our data frame and based on Polarity calculation in step 4. we allocate Positive, Negative or Neutral feeling about UBS.

# create the basic categories based on scoredef getAnalysis(score):
if score < 0:
return 'Negative'
elif score == 0:
return 'Neutral'
else:
return 'Positive'

df['Analysis']=df['Polarity'].apply(getAnalysis)
df

7.

# printing positive tweetsprint('Printing positive tweets:\n')
j=1
sortedDF = df.sort_values(by=['Polarity']) #Sort the tweets
for i in range(0, sortedDF.shape[0] ):
if( sortedDF['Analysis'][i] == 'Positive'):
print(str(j) + ') '+ sortedDF['Tweets'][i])
print()
j= j+1

8.

# printing negative tweetsprint('Printing negative tweets:\n')
j=1
sortedDF = df.sort_values(by=['Polarity'],ascending=False) #Sort the tweets
for i in range(0, sortedDF.shape[0] ):
if( sortedDF['Analysis'][i] == 'Negative'):
print(str(j) + ') '+sortedDF['Tweets'][i])
print()
j=j+1

9.

# plotting sentimentplt.figure(figsize=(8,6)) 
for i in range(0, df.shape[0]):
plt.scatter(df["Polarity"][i], df["Subjectivity"][i], color='Blue')
# plt.scatter(x,y,color)
plt.title('Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()

10.

# print the percentage of positive tweetsptweets = df[df.Analysis == 'Positive']
ptweets = ptweets['Tweets']
ptweets
round( (ptweets.shape[0] / df.shape[0]) * 100 , 1)

11.

# print the percentage of negative tweetsntweets = df[df.Analysis == ‘Negative’]
ntweets = ntweets[‘Tweets’]
ntweets
round( (ntweets.shape[0] / df.shape[0]) * 100, 1)

12.

# show the value countsdf['Analysis'].value_counts()

13.

# plotting and visualizing the countsplt.title('Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Counts')
df['Analysis'].value_counts().plot(kind = 'bar')
plt.show()

References:

https://wafawaheedas.gitbooks.io/twitter-sentiment-analysis-visualization-tutorial/content/chapter1.html
https://www.dataquest.io/blog/streaming-data-python/
https://developer.twitter.com/en/portal/projects/1425722861860360193/settings
https://www.youtube.com/watch?v=ujId4ipkBio&list=LL&index=1
https://realpython.com/intro-to-python-threading/
https://www.tutorialspoint.com/python/python_multithreading.htm
https://www.tutorialspoint.com/python/python_strings.htm
https://towardsai.net/p/data-mining/text-mining-in-python-steps-and-examples-78b3f8fd913
https://www.nltk.org
https://machinelearningmastery.com/clustering-algorithms-with-python/
https://scikit-learn.org/stable/
https://machinelearningmastery.com/clustering-algorithms-with-python/
https://link.springer.com/article/10.1007/s40595-016-0086-9
https://www.datacamp.com/community/tutorials/stemming-lemmatization-python
https://en.wikipedia.org/wiki/Document_clustering
http://people.scs.carleton.ca/~armyunis/projects/KAPI/porter.pdf
https://betterprogramming.pub/twitter-sentiment-analysis-15d8892c0082
https://www.nltk.org/howto/corpus.html
https://towardsdatascience.com/basic-binary-sentiment-analysis-using-nltk-c94ba17ae386
https://realpython.com/python-nltk-sentiment-analysis/
https://en.wikipedia.org/wiki/Natural_Language_Toolkit
https://perl.developpez.com/documentations/en/5.18.0/index-language.html
https://www.nltk.org/book/ch02.html
https://www.nltk.org/data.html
https://widdowquinn.github.io/Teaching-SWC-Lessons/python/2017-05-18-standrews/extras/nltk_example.html#using
https://www.frontiersin.org/articles/10.3389/fninf.2014.00038/full
https://www.w3schools.com/python/python_dictionaries.asp
https://en.wikipedia.org/wiki/Tuple
https://thecodex.me/blog/sentiment-analysis-tool-for-stock-trading
https://finviz.com
https://en.wikipedia.org/wiki/Beautiful_Soup_(HTML_parser)
https://www.investopedia.com/terms/s/social-science.asp
https://devopedia.org/text-clustering
https://gunicorn.org
https://www.postgresql.org
https://pycaret.org
https://www.tweepy.org
https://textblob.readthedocs.io/en/dev/

--

--