Data Mining Twitter Hashtag #CWC15


Harshit Rastogi
4 min readApr 7, 2015

2015 cricket world cup came to an end on 26th March with the host Australia clinching its fifth world cup title and reinstating their championship in the world of Cricket.

During the 40 days of world cup event, I decided to collect tweets tagged with #cwc15 and perform some data mining on it. (#cwc15 was the most frequent tag used for tweeting about the event.)

How I collected data?

I used Twitter APIs and wrote a simple node.js process, which will read the tweets with #cwc15 from twitter every few secs and store it in mongodb. Then there is a nightly job running, which reads the data and generates all time and daily stats metrics.

This way, I built a daily analytics page, which displayed tweets analytics for the day’s game. (Here is a glimpse of analytics derived for Finals and Semi-Finals)

Nz vs Aus final stats
Ind vs Aus , 2nd Semi-final

Throughout the entire event I was able to gather 600k+ tweets and including 46,324 different hashtags.

Insights I gained from these 600k tweets ?

Firstly, lets look at the region-wise distribution of the tweets sent. Its was truly a global event since tweets were received from all the continents; Asia being the most actively tweeting continent and Europe leading by number of countries tweeting during the tournament.

Around 1861 cities from 109 countries across globe, actively tweeted during the World Cup 2015.

Most active cities tweeting during the event

Most common used hashtags

46,324 unique hashtags were used

Most common #tags list

Some of the longest hashtags


Most Mentioned Countries

India leads the standing as most mentioned country with Australia, South Africa and New Zealand following it.

Most Tweeted Players

The cricketing event was primarily dominated by batsmen and each time a batsman played exceedingly well the twitter feeds exploded with praises.

MS Dhoni the Indian cricket captain dominated the twitter and was mentioned almost 15k times followed by Ab Devilliers and Rohit Sharma.

Chris Gayle Scoring 200+

DeVilliers Exceptional Performance at the Tournament

Martin Guptill’s Innings Against West Indies

Frequency of Tweeting

During final round of games such as Quarter Finals, Semi finals and final match, the Cricket fans sent out around 8000–10,000 tweets/hr using #cwc15. However, during India vs Australia, the number of tweets sent out spiked at 12,000 tweets/hr. Its interesting to observe that rate of sending tweets peaked at the end of game. (That is the time, when fans outpoured their emotions for both winning and losing team.)

Tweets frequency final Game vs 2nd Semi-final (Times displayed are in GMT)

Thanks for reading it. If you like it please share it !!

