Text mining and Sentiment analysis on Twitter using R : Case Study, Business of Esports.

Miriam Mshelia
Nerd For Tech
Published in
4 min readJun 12, 2021

DESCRIPTION OF DATA USED — TWITTER API

For the purpose of analysis, certain factors give twitter an edge over other social media platforms. Firstly , millions of tweets are published every day giving us a larger pool of data to work with. Secondly, these tweets are available to the public and can be accessed and retrieved through API. Twitter gives us an unprecedented access to vast amount of information on various industries and individuals, its accessibility makes it easy to collect and share information.

GETTING ACCESS TO TWITTER API.

In order to retrieve tweets, a Twitter API is required. The first step towards this is creating a twitter account, then applying for a developers account. The application for a twitter developers account will entail filling a form which requires stating specifically what the data(tweets) will be used for. The purpose for which the data is used determines the number of tweets you’ll be given access too.

Once the application for the twitter developers account has been approved, you’ll be issued some credentials which are unique to each user and should be kept safe:· Access secret, Access token, Consumer key and Consumer secret.

MINING TWEETS

Using R, extraction of tweets is done by download the “rtweet” package. Then proceed to set up an authentication to connect to twitter using the credentials received after setting up the twitter API. Then connect our twitter to R using:

twitter_token <- create_token(
app = ****,
consumer_key = ****,
consumer_secret = ****,
set_renv = TRUE)

SEARCHING FOR TWEETS

For this analysis we’ll focuse on #PokemonGo tweets .For the purpose of this analysis,Twitter API granted access to 20,000 tweets within the range of 6 to 9 days. Used the search_tweets function in order to access these tweets I used

PokemonGotweets <- search_tweets(“#PokemonGO”, n=20000, include_rts=FALSE, lang=”en”)

This function pulled a total of 20,000 tweets for the #PokemonGo. These tweets will be used for this analysis.

SENTIMENT ANALYSIS

Sentiment Analysis entails the use of natural language processing and text analysis techniques to identify and extract information from text usually relating to opinions and feelings towards a topic not necessarily facts. Sentiment analysis helps us understand the attitude of the users (in this case are gamers ) with respect to PokemonGo. These opinions can either be positive, negative or neutral.

#calculating total score for each sentiment
Sentimentscores_pokemonGo<-data.frame(colSums(mysentiment_pokemonGO[,]))
View(Sentimentscores_pokemonGo)
names(Sentimentscores_pokemonGo)<-"Score"
TheSentimentscores_pokemonGo<-cbind("sentiment"=rownames(Sentimentscores_pokemonGo),Sentimentscores_pokemonGo)
rownames(Sentimentscores_pokemonGo)<-NULL

The Sentiment analysis creates an insight to customer behaviour and wants. It gives business insight to business owners and investors on the possible improvements and helps with decision process for customers satisfaction.

#plotting the sentiments with scores
ggplot(data=TheSentimentscores_pokemonGo,aes(x=sentiment,y=Score))+geom_bar(aes(fill=sentiment),stat = "identity")+
theme(legend.position="none")+
xlab("Sentiments")+ylab("scores")+ggtitle("Sentiments behind tweets on the popular #POKEMONGO")

Result.

Figure 1: sentiment analysis for #PokemonGo tweets

MAPPING TWEETS

Using the data extracted with the R package “Leaflet” to make a map showing locations where #PokemonGo tweets originate from. However, only a small number of tweets were displayed on the map. This is because only a small amount of tweets were geocoded as most users turn off their locations via their privacy settings.

library(leaflet)
library(maps)

#giving leaflet access to the data
PokemonGomaps <- read.csv("C:\\Users\\Documents\\PokemonGotweets.csv", stringsAsFactors = FALSE)
mapPokemonGO <- leaflet(PokemonGomaps) %>% addTiles()
mapPokemonGO %>% addCircles(lng = ~longitude, lat = ~latitude, popup = PokemonGomaps$type, weight = 8, radius = 40, color = "#fb3004", stroke = TRUE, fillOpacity = 0.8)

The points on the map can give insight to marketers and content creators on what regions interact with the product the most and regions which could be tagged as potential market .

Figure 2:Map showing countries tweeting the most about #PokemonGO

TIMELINE OF TWEETS

Analysing the frequency of the use of the hashtag #PokemonGo can give an overall idea on the activity on the topic, giving us an idea on the days with the most and least interactions with the product’s hashtag. Using the“ ”ggplot2" package in R, plotting the frequency of the usage of the hashtag #PokemonGo over the period of time from which the tweets were pulled. We notice an increase in engagement with hashtag on 27th April with approx. 370 tweets and the most engagement on the 1st of May with approx. 450 tweets.

DRAWBACKS AND LIMITATIONS.

  • Twitter API.

-Standard Account only allows pulling of tweets up to 7 days.

-Unable to provide sufficient insights of users based on demographic due to limited geocoded tweets.

SUMMARY.

We can use open data sources like Twitter to gather valuable insights from social media platforms (Twitter) on reactions from general public relating to various topics. The insights generated from these can be used for Marketing and Communication purposes. This can also be use to create content based of the reactions from these tweets.

for more insight on this topic refer to:

  1. https://www.r-bloggers.com/2018/06/awesome-twitter-word-clouds-in-r/
  2. https://www.earthdatascience.org/courses/earth-analytics/get-data-using-apis/text-mining-twitter-data-intro-r/
  3. https://rww.science/post/trump-s-tweets-part-ii/

--

--

Miriam Mshelia
Nerd For Tech

Data Analytics | Artificial Intelligence | Data Visualization | Perspective |