Twitter Real Time Streaming API use with Python — Tweet Mining

Retrieving relevant Links to read on ‘Blockchain’ relating to ‘banks’ from real time trending tweets

From a long time I wanted to access twitter real time streaming data and use it in python for analysis. After trying a bunch of libraries, I finally used the Twython library to access Twitter API.

Made a dummy app to generate Twitter API access token and secret key to use in the library.

Generating Twitter API Access Token and Secret Key
from twython import Twython
import time
APP_KEY = ‘T1Imwl7QuUyNV9Bzeo0wa7E4’
APP_SECRET = ‘NjRytX1rbZJYrBF124qcQIq1toqgeNqe7BwBS0TDa1jVW3cAp’
twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)
ACCESS_TOKEN = twitter.obtain_access_token()
print (ACCESS_TOKEN)
APP_KEY = 'T1Imwl7QuUyNVW9Bzeo0wa7E4'             
ACCESS_TOKEN = 'AAAAAAAAAAAAAAAAAAAAAFdi0AAAAAAAMI5%2BQu7oCzYCNB0cpxnMIsbuNWU%3DzHOW7NH1FjryZozcrohk3au28XlmE2Gsl3eq20sr0PYgfItfC' #COPY AND PASTE FROM OUTPUT FROM ABOVE COMMAND
twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)
twitter.get_application_rate_limit_status()['resources']['search']

After accessing the API through the Twython Library. I wanted to test it, so initially I tried to retrieve tweets from an account. And then I wrote a python script to search the latest real time tweets for a particular keyword and return those tweets. It gives out 100 tweets for access time for like 15 min.

In this example I wanted to retrieve top tweets about blockchain, since its trending these days and also was a topic of discussion for me and my friend for like weeks.

With tweet text, I also retrieve the favorite count, the retweet count and user location to get more insights.

#RETRIEVING REAL TIME STREAMING TWEETS ABOUT BLOCKCHAIN 
search = twitter.search(q=’blockchain’, 
count=2000)
tweets = search[‘statuses’]
#for tweet in tweets:
#print (tweet[‘id_str’], ‘\n’, tweet[‘text’], tweet[‘favorite_count’], tweet[‘retweet_count’] ), ‘\n\n\n’
ids = []
#for tweet in tweets:
#ids.append(tweet[‘id_str’])
ids = [tweet[‘id_str’] for tweet in tweets]
texts = [tweet[‘text’] for tweet in tweets]
times = [tweet[‘retweet_count’] for tweet in tweets]
favtimes = [tweet[‘favorite_count’] for tweet in tweets]
follower_count = [tweet[‘user’][‘followers_count’] for tweet in tweets]
location = [tweet[‘user’][‘location’] for tweet in tweets]
lang = [tweet[‘lang’] for tweet in tweets]
#CORRELATION BETWEEN RETWEET AND FAVORITE COUNTS AND PLOT IT
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
pl = pd.DataFrame(
{'id': ids,
'text': texts,
'retweet_count': times,
'fav_count':favtimes,
'follower_count':follower_count,
'location':location,
'lang':lang
})
pl.head(100)
pd.set_option('display.float_format', lambda x: '%.2f' % x)
pl[['retweet_count','fav_count']].describe().T
pl[‘retweet_count’].corr(pl[‘fav_count’])
pl.plot(kind=’scatter’, x=’fav_count’, y=’retweet_count’)
sns.regplot(x=”fav_count”, y=”retweet_count”, data=pl)
pl[‘retweet_count’].plot(kind=’line’)
Preview of the Outputs. Left — Relation of retweets and favorite counts. Right — Tweets retrieved about blockchain. Top 100 real time.
How blockchain topic is trending right now on twitter (with respect to retweets)

Now as I have retrieved successfully around 100 tweets about blockchain and stored it into list. I perform text mining over those tweet texts to search for word ‘bank’. For this I had to import python library ‘re’

import re
def word_in_text(word, text):
word = word.lower()
text = text.lower()
match = re.search(word, text)
if match:
return True
return False
pl[‘bc_bank’] = pl[‘text’].apply(lambda tweet: word_in_text(‘bank’, tweet))
print (pl[‘bc_bank’].value_counts()[True])

Now after filtering tweets which involve blockchain and banks. I further mind the tweet text to extract link from it.

import re
def extract_link(text):
regex = r’https?://[^\s<>”]+|www\.[^\s<>”]+’
match = re.search(regex, text)
if match:
return match.group()
return ‘’
pl[‘link’] = pl[‘text’].apply(lambda tweet: extract_link(tweet))
pl= pl[pl['bc_bank']== True]
pl['link']

The two top links I found trending on twitter about blockchain and banks are: https://t.co/KFFVpaZbRz, https://t.co/fkSPQU61kj, https://t.co/4nTgh9eHhA, https://t.co/1RBIDFgSjc.

This was just an example of how to utilize twitter streaming API using python. Further analysis can be done over the tweets like I am trying sentimental analysis further.