Extracting Non-Obvious Insight from Twitter Data: An Exploratory Data Analysis

Jasmi Kevadia
INST414: Data Science Techniques
3 min readMar 25, 2023

Introduction:

Social media is a powerful tool that changed the way I communicate and share information. Twitter a popular social media platform that has become an essential tool for individuals to share their ideas and communicate with others. In this post, I will explore the Twitter API to extract insights from a set of tweets to answer the question, “What are the most popular topics of discussion on Twitter related to climate change, and how are people responding to them?” This insight could inform decisions related to marketing, social media strategy, and climate change communication.

The insights gained from this analysis could inform decisions related to marketing and social media strategy for organizations that focus on climate change communication, advocacy, and education. For example, organizations could use these insights to tailor their messaging and outreach efforts to better address the concerns and topics that are most important to their target audience. They could also use this information to identify influencers and key players in the climate change conversation on Twitter and engage with them to increase the reach and impact of their messaging.

Finding and collecting data:

To answer this question, I’ll use the Twitter API to collect tweets related to climate change. The Twitter API provides access to a vast amount of data, including tweets, user information, and search results. I used the “tweepy” library to collect tweets based on keywords related to climate change. I collected tweets from the past 30 days to ensure that the data was up to date. After collecting the data, I removed duplicates and filtered out retweets to focus on original tweets.

The Twitter API provides access to a vast amount of data related to climate change conversations on the platform, including tweets, user information, and search results. By collecting and analyzing this data, we can gain insights into the most popular topics of discussion related to climate change on Twitter and how people are responding to them. This information can help us better understand the public’s perception of climate change and tailor our communication efforts accordingly.

Results:

The analysis revealed that the most popular topics of discussion related to climate change were the impacts of climate change, renewable energy, and climate policy. I also found that the sentiment of the tweets was predominantly negative, with people expressing concerns about the lack of action on climate change.

This table shows the top 10 most frequently discussed topics related to climate change on Twitter, along with the frequency of each topic and the predominant sentiment expressed in tweets about that topic (negative, positive, or neutral). These results could be used to inform marketing and social media strategies related to climate change communication, such as targeting messaging around renewable energy and climate action to audiences that express more positive sentiments, or addressing concerns about the impacts of climate change and the lack of climate policy action in messaging directed at audiences that express negative sentiments.

Cleaning and preprocessing data:

To clean up the data, I removed duplicates and filtered out retweets. I also used regular expressions to remove URLs and mentions from the tweets to focus on the content. One common bug I encountered was that some tweets contained emojis and special characters that were not supported by the analysis tools. I fixed this by using the emoji library to remove unsupported characters from the tweets.

Limitations:

The analysis has several limitations. First, the sample size is limited to the past 30 days of tweets, which may not be representative of the broader discussion on climate change. Additionally, the analysis is limited to tweets in English, which may not capture the full range of discussion on climate change.

Conclusion:

In conclusion, I used the Twitter API to collect and analyze tweets related to climate change to understand the most popular topics of discussion and how people are responding to them. The analysis revealed that people are concerned about the impacts of climate change, renewable energy, and climate policy, and that the sentiment of the tweets is predominantly negative. While the analysis has limitations, it provides a starting point for understanding the public’s perception of climate change on social media.

Github:

https://github.com/jasmi01/INST414Exercises/blob/main/assignment1

--

--