How’s 2021 going? Twitter Sentiment Analysis (NLP)
An NLP Data Science project to find out how people feel about 2021.
After 2020 turned out to be a disaster, we’ve all been looking forward to 2021 with hope. I decided to perform a Twitter Sentiment Analysis to find out if the new year is treating us well. Within a few hours I was able to scrape 37,621 tweets, using the following phrases as a search query:
- “2021 is”
- “2021 will”
- “This year”
With this project I wanted to get familiar with the Natural Language Processing (NLP) techniques and answer the following questions:
- What are the most common words people use to describe 2021?
- What is the number of tweets with positive, negative and neutral sentiment?
- What are the most common words used in positive, neutral and negative tweets?
- What are the most liked and retweeted posts?
In this article, I’d like to share my findings. If you’d like to look into my source code or get the .csv file with mined tweets, please visit my GitHub.
Tools and workflow
This diagram shows the project flow:
The tools used include Tweepy (for mining tweets), Pandas (for data cleaning/wrangling), Tweet Preprocessor (for rapid tweet cleaning), NLTK (for tokenization, stopwords removal and POS tagging), Plotly, Matplotlib and Word Cloud (for visualization). You can check the remaining built-in libraries and detailed explanations in my source code on GitHub.
I used a Python library Tweepy to build a tweets dataset from scratch. Tweepy works with Twitter API and in order to use it, you need to start a Twitter developer account. That’s how you can get a unique consumer key/access code that you need to access the Twitter data. Tweepy is a pretty straightforward library to use, however scraping tweets can be slow due to the new mining limits which Twitter imposed through their API. That’s why Tweepy introduced the…