How I built a Twitter dashboard
From building an ETL pipeline to automatically update a dashboard.
In this article, I would like to share how I use Python to extract data from Twitter API and stream them into the PostgreSQL database, perform sentiment analysis and further build a dashboard using Tableau to track the weekly trends.
“Based on Twitter usage statistics, there are 500 million tweets sent each day. There are 330 million monthly active users. ” — Brandwatch
As a data analyst, I am curious about what people say on Twitter, how often they tweet, people’s attitudes toward specific hashtags, etc. Twitter API provides us an easy way to retrieve tweets about specific keywords and hashtags. In other words, I can utilize the tool to analyze any topics I am interested in.
I have been involved in the MakeoverMonday project for months. MakeoverMonday is a social data project that each week rework a chart, retell a story more effectively, or even find more interesting insights. It has been my favorite project in 2020. The community would share their thoughts on Twitter and interact with other members. As a result, I am interested to learn more from hashtag #makeovermonday.
This project aims to build an automatic ETL pipeline that allows me easily to update the dashboard every Monday. For this reason, I choose to store the data in the Postgresql database, which I can store every week’s new data and load to Tableau. Besides, I want to write a script that I can reuse for different hashtags and keywords.
Based on data from 2020/12/22 to 2021/01/02.
- Not surprisingly, people like to post on Monday since that is the day supposed to do the project. I like to finish my project on Monday as well since it can force me to be productive.
- 95% of the tweets are positive or natural. People are happy to share their works and give positive feedback.
I would briefly share what tools and steps I used for the project. Feel free to reach out to me if you have any questions.
Tools I use
Extract data: Twitter API, Python (tweepy)
Load data into PostgreSQL: Python(sqlalchemy), PgAdmin
Sentiment Analysis: Python(TextBlob,stopwords)
EDA& Data visualization: Tableau
Connect Twitter API
There are a lot of articles in Medium talking about how to connect Twitter API. The first step is to go to Twitter Developer to register an account, access your keys.
Here is some resource I use as a reference. There are different ways to connect. Feel free to use any approach you like:)
Stream data into Postgresql
In this step, I used SQLAlchemy package. An important thing to keep in mind is that sometimes we might extract duplicate data from Twitter. As a result, we need to set a constraint for our database and use the following function to skip duplicate rows or update the existing records to avoid duplicate records in our database.
Sentiment analysis is a way to tell if the text data is positive, negative, or neutral using the NLP technique. Here I used TextBlob’s sentiment property to get the polarity score of the string. The polarity score within the range [-1.0, 1.0]. For this project, I am only interested in if the tweet is positive, neutral, or negative; thus I choose to return 0(neutral), 1(positive), -1(negative).
The final step is to connect my PostgreSQL database with Tableau. One thing I struggled with is the version of PostgreSQL. Currently, Tableau works fine with PostgreSQL 10(It works for me.), but PostgreSQL 12 doesn’t work. Even though the latest update of Tableau update shows that it supports PostgreSQL 12. If you have the same issue, just download an older version of PostgreSQL and remember to use a different port number to avoid conflict.
For the dashboard design, I want to show this week’s total tweets, unique user count, total retweet count, the percentage of positive and neutral tweets, and the comparison with the last seven days. With the big scatter plot, I plot all tweets within seven days (you can change the parameter to 14 or 21). It would show the tweet’s username and total retweet count for the specific user.
It is fascinating to see that I can build the data pipeline and update new data easily. By making the dashboard, I can clearly see the tweets trend and see what people are talking about.
I hope you enjoy reading this:)