Twitter #PulwamaAttack Data Analysis(Tweepy and plotly)

Indresh Bhattacharyya
4 min readFeb 21, 2019

--

On the onset of Valentines Day(February 14, 2019) 44 of Our Brave Indian Soldiers were Killed in a terrorist bombing in the Pulwama region of India.

Warm-Hearted condolences to the Soldiers who lost their lives in the attack and to their family. This article is dedicated to them and all other Soldiers that are protecting our Country and other countries with their life. This article is dedicated to them. Thank you for keeping us Safe.

So let's start

Motivation: Get tweets from Twitter from the day of attack till date. And to visualize how many tweets have been tweeted on the basis of the country . basically which country has tweeted the most.

Tools:

  1. Tweepy
  2. Plotly
  3. Pandas

Step 1:

  1. Go to https://developer.twitter.com/
  2. Create app → register for it (they will ask you a couple of Questions Answer them.)
  3. After registration create App again. Go to that app find Keys and Tokens
  4. Get both Consumer API keys
  5. And also Access token & access token secret

Step 2:

It’s time for coding

Now let's go through What is in this.

Credential.py has basically 4 variable:

Fill the credentials from your Twitter App

CUSTOMER_KEY=''
CUSTOMER_SECRECT_KEY=''
ACCESS_TOKEN=''
ACCESS_SECRECT_TOKEN=''

Authentication: We have to provide customer API keys here for Authentication

.set_access_token((Access token),(Access token secret))

to set the API access token and Public access token

auth=OAuthHandler(Credential.CUSTOMER_KEY,Credential.CUSTOMER_SECRECT_KEY)
auth.set_access_token(Credential.ACCESS_TOKEN,Credential.ACCESS_SECRECT_TOKEN)

Now we are creating the API object which will be used for fetching data:

api=API(auth,wait_on_rate_limit=True)

Finally Fetching the data:

for tweet in tweepy.Cursor(api.search,q="PulwamaAttack",lang='en',count=100,since='2019-02-14',tweet_mode="extended").items():
date=tweet._json['created_at']
print(date)
with open('tweetCur.json','ab') as file:
file.write(str(str(tweet._json)+'\n').encode('utf-8'))

What happened here is simple. Tweepy Cursor takes the API object and searches for specific keywords. In my case, I wanted to get tweets having #PulwamaAttack hashtag.

The basic rule of thumb is Cursor(APIObject,search_parameter)

Other Optional parameters are

  1. lang=”en” →Tweets specific to the English Language (that is only tweets in English)
  2. since=”2019–02–14” ie from 14th feb 2019(day of the bombing) till date.
  3. tweet_mode=”extended” Getting the whole tweets

Generally, tweets are truncated. And we don't need the full tweet at the moment but in the next part, we are going for sentiment analysis of the tweets so that's why I collected them before Hand.

So now that we have fetched the data. Its time for getting the Geographic Data.

The ‘location.csv’ file contains geographic data of all countries with their city names and subdivisions with their specific Country Codes. I found the data set from an Open Source platform I will share the details I will also make the data available on the GitHub.

https://www.maxmind.com/en/free-world-cities-database

From the data, we take the ‘country_name’, ‘city_name’, ‘iso_code’ part of the dataset. From here we will Map them to the twitter location.

Ok, so now we have the geoLocation details we can work on the getting the user location part of it.

string_to_json() changes string to json value.

line[‘user’][‘location’] getting the location of the tweet and finally having the tweet location in a file.

Final Step Mapping:

Now finally we have all the city and country. Location of each Tweet. Mapping the location of each tweet first according to the country name then according to the city name in a final dataframe with country and country code for each tweet.

Time for plotting:

1st I am counting the occurrence of each country that is the number of tweets from each country and removing duplicate records after that so that each country will be mentioned only once.

Final Result:

Tweet Involvement of countries are in the color Blank Means they have not tweeted

Conclusion:

From the above, it seems that India and Pakistan have been the most Active in twitter about #PulwamaAttack . Other countries (in Dark blue) have also actively tweeted on or for This Terrorist Attack.

Part 2: will have Sentiment Analysis of the tweets

Finally, I want to Thank all The Army people for their Contribution for a better and safer world we live in.

Follow me on LinkedIn:

--

--