It is a data collection and visualization project.
This project is started with a simple idea; storing trending topics on Twitter in order to access them later. Here, the problem is Twitter shows us only current trending topics with their volume information like 11.7K tweets in last hour. However, we don’t see or access previous date’s trending topics. For this reason I started this project.
This project has stored the trending topics on Twitter since July 2013. Specifically stored trending topics and hashtags are the ones that appeared in the regions: Turkey and World Wide. Every 10 minutes trending topics are fetched via Twitter API and stored on the database. Website of this project visually shows trending topics in terms of how much time they appeared on the list of Twitter.
Twitter shows us only current trending topics with their volume information like 11.7K tweets in last hour. However, one year ago at 12:00 PM what are the trending topics? How many time people continue to talk about that topic? What is the -average, min, max- life time of a trending topic? What is the relationship between the volume and duration of a topic? How do they change in terms of region? These questions are some them that motivates me to build this project. Further, It draws the attention of other researchers around the world and they request me the collected data by this project. It already used in one research project, others in progress.
What is trending topic?
Twitter’s algorithm “identifies topics that are popular now, rather than topics that have been popular for a while or on a daily basis, to help you discover the hottest emerging topics of discussion on Twitter that matter most to” users.
It is important to note that on the Twitter app you see the topics that is specifically curated for you based on “who you follow and your location”. However, trending topics are also generated by region based. In this project region based trending topics are used.
It has a powerful API that supports many third-party applications. In other words, it provides great opportunity to developers and researches to built their own tools to get insights about the data generated on their platform.
- It runs on Google App Engine, coded with Pyhon and D3.js.
- Data fetched every 10 minutes.
- Data stored in minutes resolution, not day or month. In other words, you can get trending topics for a specific time such as June 8, 12:00.
- It uses timestamp, therefore it is independent from time zone.
- It caches last 24 hours data for fast reaching. However, fetching historical data takes time.
Although collecting trending topics of every region is a trivial job, it is not preferred because of these two problems:
- Twitter API has a request rate limit. Therefore, it does not allow to make too many requests in a specified time frame. Fetching every region’s trending topics result in rate limit exceed.
- It increases the datastore write costs and bills on Google App Engine. This project has not any income (I have just started to accept donation via website to keep running up the website). Therefore, keeping bills low is more preferable.
Because of this two limitation, only Turkey and World Wide is selected to collect trending topics. One of them is Turkey because it is my hometown. Another one is World Wide in order to offer shared trending topics to the visitors of the website.
Currently, I do not work on this project. I just make small updates for maintenance. Further, I open the contributions to the project in terms of both code and future requests. Here, some of the features in my mind:
- Searching a keyword among the trending topics. If it exists, it will return its occurrences through the history.
- Comparison of topics.
Up to now collected data is used for two academical works. I can share the collected data for your research. Please contact via Twitter to talk the details.