Visualizing Media responses to 2018 Kerala Floods using Twitter data.
Data Visualization Assignment — IDC School of Design, IIT Bombay
Kerala faced one of its worst disaster in July and August 2018. The amount of rain was much higher than the normal figures and was the highest in the last century, after the Deluge of 99. Technology played a critical role in the rescue and relief activities and in curbing the magnitude of the disaster. The whole nation came in support of the small southern state but there were criticisms towards the national media for not giving significant importance to the Kerala flood-related news.
…there were criticisms towards the national media for not giving significant importance to the Kerala flood-related news.
I came across videos and articles which discussed the role of tweets as social sensors and how the data collected can be used to analyse and predict human behaviours and patterns. Having inspired from these, I focused my assignment on social media responses related to Kerala floods 2018. I chose Twitter as my data source as it was easy to get bulk data and it was more representative of focused responses towards a topic.
Twitter let developers use tweets data in their applications using API calls. You need to sign in to Twitter and apply for a developer account. Once the account is approved(it took 2 days for me), an App has to be created which generates customer keys and tokens which have to be used in the scripts. There were two relevant data scraping methods using Twitter API, which were of interest to me:
- Scraping all the tweets with a particular hashtag
- Scraping all data from a user timeline
My initial idea was to track hashtags related to Kerala flood and geo-visualize them to understand how the country and world responded to the situation. I had to scrap the idea because of two main reasons: the percentage of geo-tagged tweets were very less and the data was much larger than what Twitter API lets users freely use. I tried to work around this; tried other scripts, used third-party plugins and some jugaad, but then understood that the data cannot be collected by reading this report.
After the initial setback, I decided to scrape data from the timelines of the media channels in India, both national and regional, to understand News coverage trends during the time of Kerala flood. For this assignment, I have visualised the tweets of 5 national news channels and 5 local news channels from August 1st to September 13.
Data Collection
Scraping Data: I scraped twitter timeline data by using Twitter API in R. Twitter has an upper cap of n=3200 on the number of recent tweets that can be scraped from a user timeline. I could not get useful data from some of the major twitter news handles because of this. I also had to use a third party service as my R script did not give data for some handles. I believe it is because of the restrictions on the number of API calls that can be made in an hour. I also looked at other sources such as YouTube channels and websites, but archived data was not available.
After scrapping I had around 30000+ tweets from 30+ Indian, Twitter, news channels. I had to let some of the channels go because of fact that the latest 3200 tweets were not fitting into the timeline (Aug 1st — Sep 13) I was focusing on. Some channels with too little data were also removed.
Translating Data: I had to translate the collected data in Malayalam and Hindi to English to filter them based on keywords. Google sheets’ =GOOGLETRANSLATE() function was used to translate the text into English. The function translates all the important except complex words.
Once the data was translated, I had 13000+ tweets from 44 days by 10 channels( 5 national and 5 regional news). I identified keywords associated with Kerala floods. The keywords used to filter was flood, rain, Kerala, dam, Idukki, rescue, disaster, donation and Periyar. I stopped filters when any other related keyword was giving zero results. After filtering, I had 2079 tweets related to Kerala Floods by 10 different news channels over 44 days.
Visualizing Data
I made a preliminary visualization using dygraphs to visualise the percentage of Kerala flood related tweets along the time axis.
As you can see in the graph, N02, a national media channel, had very less tweets in the first week of August.
One observation at this stage was that, on an average, national media has more tweets per day compared to regional media. Therefore, the number of tweets alone may not be an effective way to compare the responses of national and regional news channels.
I decided to compare the percentage of Kerala flood-related tweets over total tweets published by each Twitter channel. This gives a better representation of the significance given by each channel to the national disaster.
I used a Stacked Histogram to understand the overall tweet responses to the disaster and understand how each channel was contributing to it. 5 regional(local) channels are represented by shades of blue and 5 national channels are represented by shades of orange. I used Tableau Public to make the visualization. The legends on the right top corner can be used to focus on one particular channel.
If we compare the Percentage of Tweets graph and the Tweet Counts graph, we can see that national media inactive
Observations that can be made from the visualization are:
- The overall distribution resembles the shape of a normal distribution curve, peaked at August 18, 2018.
- National media gave very less attention in the first 8 days of August, even though regional media was giving enough attention to the issue. At the peak values, both national and regional channels gave enough weight to the flood-related news.
National media covers news from all over India. Therefore, it is expected that the fraction of tweets related to Kerala flood to be lesser than regional media. Therefore it cannot be concluded from this visualization, that National Media was not covering news related to Kerala floods. A way for effective comparison would be by identifying the fraction of tweets related to other recent significant events — Modi’s Independence speech, Adal Bihari Vajpei’s demise etc. — and comparing it with Kerala Flood.
Links: Data Collected from Twitter, dygraphs visualization, Interactive Data Visualization