Collecting tweets using python.

Satish Chandra
Skills For DataScience and AI
2 min readAug 24, 2014

In this post, I like to show how we can collect tweets of our interest using twitter dev API. Twitter platforms offers access to their corpus via REST APIs. There are mainly three different types of APIs provided by twitter:

  1. Streaming : API for fetching real time streaming tweets for specific interests. If you are looking to a data mining product or want to perform data analytics on live tweets. This is a suitable API.
  2. Search: API for fetching historical tweets. You can collect tweets of specific keyword, finding tweets from a specific user, hashtag and tweets from a specific user.
  3. REST: For twitter core elements like status updates, user profile, profile avatar, user connections(followers) of specific user. With twitter API the user can post replies, retweets and favorite specific tweets.

Twitter maintains a clear documentation of these APIs here.

https://dev.twitter.com/docs/api/1.1

Though twitter provides above APIs, it restricts the applications/users to tweet more than a number of tweets in a time window. So, it is good to select the keywords, hashtags and users specific to your area/product/topic of your interest than querying for general terms. Please visit the following link to know more about twitter rating limits

Tweepy:

There are many python modules out there like tweepy, python-twitter, tweetstream etc., which makes it easy for making establising connection, listening to stream, collecting tweets etc., I haven’t used many of these tools but I used tweepy and it suits wells for my requirement of collecting stream results related to an organization, follow users, search for hashtags and keywords. Please check the following documentation about tweepy

http://tweepy.readthedocs.org/en/v2.3.0/

In this article, I want to cover how to use tweepy and collect live stream results related to a keyword.

Here are the pre steps to perform before running your first python program to collect tweets:

  • Signup for Twitter Dev account here
  • Create new application at the following location
  • For accessing twitter API, you need API key, API secret, Access token and access secret. You can find these keys for your newly created application in the API keys tab of your application. You can click on “Generate access key” button to generate access token and secret
  • Install tweepy module
  • pip install tweepy

Python Code:

__author__ = 'divakarla'

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import json


#consumer key, consumer secret, access token, access secret.
consumer_key = "***************************"
consumer_secret = "***************************"
access_token = "***************************"
access_secret = "***************************"


# Note these are some invalid test users.
users=["12332342344522","1232132423422323","21323213231234","23232121323243","2490921312323137"]
hash_tags=["#testhashtag1","#testhashtag2","#testhashtag3"]
terms=["keyword1","keyword2","keyword3,"keyword4"]


class Listener(StreamListener):
def on_data(self, data):
try:
# Parsing the json recieved from the twitter stream

jsonData = json.loads(data)
print(jsonData)
createdAt = jsonData['created_at']
text = jsonData['text']
print "Created at : " , createdAt , " text : " , text
saveThis = createdAt + " ---> " + text
saveFile = open("results.csv","a")
saveFile.write(saveThis.encode('utf-16'))
saveFile.write("\n")
saveFile.close()
return True
except BaseException , e:
print 'Failed on data : ', str(e)
time.sleep(5)

def on_error(self, status):
print "Error : ",status

if __name__ == '__main__':
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

# Listener for twitter streaming
twitterStream = Stream(auth, Listener())

# Twitter filter
twitterStream.filter(follow=users,track=hash_tags+ terms,languages=["en"])

--

--