Twitter Mining

Mr. I
kasta
Published in
3 min readDec 10, 2017

Makassar, Irsyad — Badmood is coming but I should do something. As I said before that today I have to writing many things. So I decide to write Twitter Mining.

This article is about data mining on Twitter. The topics covered in this series include the following:

  • Interacting with the Twitter API using Tweepy
  • Twitter data — the anatomy of a tweet
  • Tokenization and frequency analysis
  • Hashtags and user mentions in tweets
  • Time series analysis

Getting Started

Twitter is one of the most well-known online social networks that enjoy extreme popularity in the recent years. The service they provide is referred to as microblogging, which is a variant of blogging where the pieces of content are extremely short in the case of Twitter, there is a limitation of 140 characters like an SMS for each twitter. Different from other social media platforms, such as Facebook, the Twitter network is not bidirectional, meaning that the connections don’t have to be mutual: you can follow users who don’t follow you back, and the other way round.
Traditional media is adopting social media as a way to reach a wider audience, and most celebrities have a Twitter account to keep in touch with their fans. Users discuss happening events in real time, including celebrations, TV shows, sports events, political elections, and so on.
Twitter is also responsible for populating the use of the term hashtag as a way to group conversations and allow users to follow a particular topic. A hashtag is a single keyword prefixed by a # symbol.
Given the variety of uses, Twitter is a potential gold mine for data miners, so let’s get started.

The Twitter API

Twitter offers a series of APIs to provide programmatic access to Twitter data, including reading tweets, accessing user profiles, and posting content on behalf of a user.
In order to set up our project to access Twitter data, there are two preliminary steps, as follows:

  • Registering our application
  • Choosing a Twitter API client

The registration step will take a few minutes. Assuming that we are already logged in to our Twitter account, all we need to do is point our browser to the Application Management page at http://apps.twitter.com and create the new app.
Once the app is registered, under the Keys and Access Tokens tab, we can find the information we need to authenticate our application. The Consumer Key and Consumer Secret are a setting for your user account. Your application can potentially ask for access to several users through their access token. The Access Level of these settings defines what the application can do while interacting with Twitter on behalf of a user: read-only is the more conservative option, as the application will not be allowed to publish anything or interact with other users via direct messaging.

Rate Limits

The Twitter API limits access to applications. These limits are set on a per-user basis, or to be more precise, on a per-access-token basis. This means that when an application uses the application-only authentication, the rate limits are considered globally for the entire application; while with the per-user authentication approach, the application can enhance the global number of requests to the API. It’s important to familiarize yourself with the concept of rate limit, described in the official documentation (https://dev.twitter.com/rest/public/rate-limiting). The implications of hitting the API limits is that Twitter will return an error message rather than the data we’re asking for. Moreover, if we continue performing more requests to the API, the time required to obtain regular access again will increase as Twitter could flag us as potential abusers. When many API requests are needed by our application, we need a way to avoid this. In Python, the time module, part of the standard library, allows us to include arbitrary suspensions of the code execution, using the time.sleep() function.

--

--

Mr. I
kasta
Editor for

Code using various programming language commonly based on JVM (Java, Scala, Groovy) with DBMS (Oracle, PostgreSQL & MySQL)