100 Scripts in 30 Days challenge: Script 23,24,25: Parsing Tweets & Graph Analytics from Pickle file
Some of the final scripts from my exploration of Twitter API using Python. But here more focus is on Data Mining than exploring Twitter. I have used pickle files for storing tweets but you can use any database, and much of the below programs can be used without much change by you too.
During the course of the scripting one big challenge I found was a good javascript library for Graph Analytics though gephi & pajek are also good tools for analyzing graph networks. But I am still searching for a good graph analytics tool in html5. And finally networkx as a tool for doing graph analytics on a small amount of data is very good. Through others can use spark and other libraries for graph analytics at scale.
Latest Networkx documentation an be found in below link.
The code details is given below:
script_23_parse_tweet.py — Helps in parsing tweets from pickle database and writes the output to json file. It also uses ttp library to extract entities like urls, hastags and mentions to show relevant information.
script_25_check_retweets.py — Is a script on checking if a tweet is a re-tweet and relevant information who originally sent it.
script_24_graph_conn_retweets.py — This script tries to connect various retweets from who is the source and who re-tweeted and create a graph. It also saves the data in a format that can be used by other graph analytics software like gephi.